In: Paparoditis N., Pierrot-Deseilligny M.. Mallet C... Tournaire O. (Eds), IAPRS. Vol. XXXVI11. Part 3A - Saint-Mandé, France. September 1-3. 2010
Figure 2: Average recall and precision values for different background subtraction methods using different parameters. Left: values for
the 2D/3D video shown in table 1, right: for the video shown in table 2
only a different depth but also at least a slightly different color
and infrared reflectance properties. Other reasons are limitations
of video and ToF cameras, e.g., the infrared reflectance of an ob
ject has an influence on the depth measurement (or its noise level)
and low luminance impairs chromaticity measurements. There
fore, a linkage between the dimensions reduces the noise level
in the foreground mask, the amount of misclassification due to
shadows and block artifacts which occur when only depth mea
surements are inappropriate.
More elaborate variants such as learning modulation and special
treatment of deeper observations when determining what obser
vations are considered background are described in (Harville et
al., 2001) but do not seem to be necessary for simple scenarios.
4 EXPERIMENTS
The approach described in this work can be evaluated by exam
ining if it obeys the following principles:
• When the ordinary background subtraction based on color
works, the results should not be harmed, e.g., through block
artifacts at the border of the foreground mask.
• When the foreground is not classified correctly only with
color, this should be compensated by depth.
• The shadow treatment of the color based background sub
traction is still far from perfect and should be improved through
depth information.
The following methods were compared in the course of this work.
’GMM’ is the standard color based GMM approach (Stauffer and
Grimson, 1999) and ’Original GMMD’ is the original color and
depth based method from (Harville et al., 2001). 'GMMD with
out depth’ is the method described in this work without depth
measurements (always A z = 0) and with A a = 0, whereas
in ’GMMD' X z is determined based on the amplitude modula
tion for each pixel similar as in (Harville et al., 2001) and in
’MMGMM’ A a = 1 is set additionally. The values for the OpenCV
GMM method are given for reference only, since it contains post
processing steps and is therefore not directly comparable.
In table 1 the results for a 2D/3D video with difficult lighting
conditions using these methods are shown. The same parameters
were used for all methods: a maximum number of 4 Gaussians
per pixel, a learning rate of a = 0.0005, an initial o — 5 and a
threshold T near = 3.5. Due to the fact that all methods operate
based on the same principle the results should be comparable for
a given set of parameters. This was also confirmed by several pa
rameter variations.
The results demonstrate the ability of this method to achieve the
mentioned objectives. The misclassification of shadows is re
duced and the natural borders of the foreground are harmed less.
When the classification based on color fails, these areas are filled
at least partly. The compensation is unfortunately often done in
a blockwise fashion (see figure 1). This drawback is further dis
cussed the next section.
Image sequences from another experiment are shown in table 2
using the same parameter set. Here the lighting conditions are
far better so that the standard GMM algorithm can in theory dis
tinguish between foreground and background. On the other hand
shadows and the similarity between foreground (jacket) and back
ground cause large problems in this video. The method proposed
in this work does not affect the good classification based on color
but allows for better shadow treatment due to the available depth
values.
In figure 2 quantitative results for both 2D/3D videos are shown.
A ground truth was created per hand for every 5th frame starting
with the last empty frame before the person enters the scene and
ending with with first empty frame after the person has left the
scene. Then the number of true positives tp, false positives fp
and false negatives fn was counted in each frame for the differ
ent methods using thresholds T nea r = 2, 2.5,..., 8 to calculate
the recall tp/(tp+fn) and the precision tp/(tp + fp) values and
their average over all frames was plotted. Here all variants of the
proposed methods perform superior to the classic approach and to
the original GMMD method with the exception of the MMGMM
method in the first video which on the other hand achieves the
best results for the second video. This behavior is due to the fact
that the scene in video 1 is much more difficult to light than the
scene from video 2, which results in higher noise levels in the am
plitude modulation images in video 1. The very different values
for the OpenCV GMM method for the second video are caused
by the fact that this method classifies the TV correctly, whereas
all other methods fail in that respect. The comparably low recall
values, i.e., a largely incomplete true foreground possibly due
to the foreground background similarity, for the OpenCV GMM
method are worth mentioning.