Full text: Papers accepted on the basis of peer-reviewed full manuscripts (Part A)

In: Paparoditis N., Pierrot-Deseilligny M.. Mallet C... Tournaire O. (Eds), IAPRS. Vol. XXXVI11. Part 3A - Saint-Mandé, France. September 1-3. 2010 
Figure 2: Average recall and precision values for different background subtraction methods using different parameters. Left: values for 
the 2D/3D video shown in table 1, right: for the video shown in table 2 
only a different depth but also at least a slightly different color 
and infrared reflectance properties. Other reasons are limitations 
of video and ToF cameras, e.g., the infrared reflectance of an ob 
ject has an influence on the depth measurement (or its noise level) 
and low luminance impairs chromaticity measurements. There 
fore, a linkage between the dimensions reduces the noise level 
in the foreground mask, the amount of misclassification due to 
shadows and block artifacts which occur when only depth mea 
surements are inappropriate. 
More elaborate variants such as learning modulation and special 
treatment of deeper observations when determining what obser 
vations are considered background are described in (Harville et 
al., 2001) but do not seem to be necessary for simple scenarios. 
4 EXPERIMENTS 
The approach described in this work can be evaluated by exam 
ining if it obeys the following principles: 
• When the ordinary background subtraction based on color 
works, the results should not be harmed, e.g., through block 
artifacts at the border of the foreground mask. 
• When the foreground is not classified correctly only with 
color, this should be compensated by depth. 
• The shadow treatment of the color based background sub 
traction is still far from perfect and should be improved through 
depth information. 
The following methods were compared in the course of this work. 
’GMM’ is the standard color based GMM approach (Stauffer and 
Grimson, 1999) and ’Original GMMD’ is the original color and 
depth based method from (Harville et al., 2001). 'GMMD with 
out depth’ is the method described in this work without depth 
measurements (always A z = 0) and with A a = 0, whereas 
in ’GMMD' X z is determined based on the amplitude modula 
tion for each pixel similar as in (Harville et al., 2001) and in 
’MMGMM’ A a = 1 is set additionally. The values for the OpenCV 
GMM method are given for reference only, since it contains post 
processing steps and is therefore not directly comparable. 
In table 1 the results for a 2D/3D video with difficult lighting 
conditions using these methods are shown. The same parameters 
were used for all methods: a maximum number of 4 Gaussians 
per pixel, a learning rate of a = 0.0005, an initial o — 5 and a 
threshold T near = 3.5. Due to the fact that all methods operate 
based on the same principle the results should be comparable for 
a given set of parameters. This was also confirmed by several pa 
rameter variations. 
The results demonstrate the ability of this method to achieve the 
mentioned objectives. The misclassification of shadows is re 
duced and the natural borders of the foreground are harmed less. 
When the classification based on color fails, these areas are filled 
at least partly. The compensation is unfortunately often done in 
a blockwise fashion (see figure 1). This drawback is further dis 
cussed the next section. 
Image sequences from another experiment are shown in table 2 
using the same parameter set. Here the lighting conditions are 
far better so that the standard GMM algorithm can in theory dis 
tinguish between foreground and background. On the other hand 
shadows and the similarity between foreground (jacket) and back 
ground cause large problems in this video. The method proposed 
in this work does not affect the good classification based on color 
but allows for better shadow treatment due to the available depth 
values. 
In figure 2 quantitative results for both 2D/3D videos are shown. 
A ground truth was created per hand for every 5th frame starting 
with the last empty frame before the person enters the scene and 
ending with with first empty frame after the person has left the 
scene. Then the number of true positives tp, false positives fp 
and false negatives fn was counted in each frame for the differ 
ent methods using thresholds T nea r = 2, 2.5,..., 8 to calculate 
the recall tp/(tp+fn) and the precision tp/(tp + fp) values and 
their average over all frames was plotted. Here all variants of the 
proposed methods perform superior to the classic approach and to 
the original GMMD method with the exception of the MMGMM 
method in the first video which on the other hand achieves the 
best results for the second video. This behavior is due to the fact 
that the scene in video 1 is much more difficult to light than the 
scene from video 2, which results in higher noise levels in the am 
plitude modulation images in video 1. The very different values 
for the OpenCV GMM method for the second video are caused 
by the fact that this method classifies the TV correctly, whereas 
all other methods fail in that respect. The comparably low recall 
values, i.e., a largely incomplete true foreground possibly due 
to the foreground background similarity, for the OpenCV GMM 
method are worth mentioning.
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.