Photogrammetric computer vision and image analysis: Papers accepted on the basis of peer-reviewed full manuscripts

paparoditis, nicolas
In: Paparoditis N., Pierrot-Deseilligny M„ Mallet C. Toumaire O. (Eds). 1APRS, Vol. XXXVIII. Part 3A - Saint-Mandé, France. September 1-3, 2010 
(a) Input video (b) Depth image (c) GMM on color (d) GMM on color and depth 
Figure 1: Results for (a) a challenging input video , (b) a depth image where invalid measurements are black, (c) the foreground mask 
using GMM only for color and (d) GMM based on color and depth 
ble refinements of the depth for a color pixel. Again a bilateral 
filter is applied to this volume and after sub-pixel refinement a 
proposed depth is gained. The optimization is performed iter 
atively to achieve the final depth map. The incorporation of a 
second view is also discussed. In (Bartczak and Koch, 2009) a 
similar method using multiply views was presented. 
An approach working with one color image and multiple depth 
images is described in (Rajagopalan et al., 2008). Here the data 
fusion is formulated in a statistical manner and modeled using 
Markov Random Fields on which an energy minimization method 
is applied. 
Another advanced method to combine depth and color informa 
tion was introduced in (Lindner et al., 2008). It is based on edge 
preserving biquadratic upscaling and performs a special treat 
ment of invalid depth measurements. 
3 GAUSSIAN MIXTURE MODELS 
All observations at each pixel position x = [x,y) T are modeled 
with a mixture of Gaussians to estimate the background. The 
assumption is that an object in the line of sight associated with 
a certain pixel produces a Gaussian formed observation or sev 
eral in the case of a periodically changing appearance (e.g., mov 
ing leaves, monitor flickering). Each observation is then mod 
eled with one Gaussian whose mean and variance is adapted over 
time. An observation at time t for a pixel x is given by s l (x) = 
[si (x), s^x),..., s^(x)] T . The probability distribution density 
of s t (x) can now be described by 
/«*(«)(D = • Nz (1) 
determine the parameters of the mixture, the usual online cluster 
ing approach is used in this work: When a new observation s(x) 
arrives it is checked if it is similar to already modeled observa 
tions or if it is originating from a new object. It may also just be 
noise. This is done by evaluating the Mahalanobis distance •) 
towards the associated Gaussian Nx{y., E*) 
<5 (x,y.) = yj(/z. -x) T £. 1 (f±.-x) < T ne 
(3) 
with Tnear being a given constant. If similar observations have 
been recorded, their Gaussian is adapted using the observed data. 
Otherwise, a new Gaussian is created and added to the mixture. 
An exact description of a possible implementation can be found 
in (Stauffer and Grimson, 1999) for normal videos and in (Harville 
et al., 2001) with additional depth values. 
An observation for a pixel is given by s(x) = (y, Cb, c r , z, a) T 
in this work and contains the color value in YCbCr format, a 
depth value z and an amplitude modulation value a. The 2D/3D 
camera produces a full size color image and low resolution depth 
and amplitude modulation images which are resized to match to 
color images by the nearest neighbor method. The variances of 
all Gaussians are limited to be diagonal to simplify computations. 
When working with ToF data, invalid depth measurements due to 
low reflectance, have to be handled cautiously. A depth measure 
ment is considered invalid if the corresponding amplitude is lower 
that a given threshold. In (Harville et al., 2001) an elaborate log 
ical condition is used to classify a new observation. Experiments 
show that this can be simplified by using the measure 
£(^/e) 2 = ~x) T Ei 1 
(Hi ~ *) ( 4 ) 
where 
and checking the condition 
JVs(fc.S.) = 
- e xp{-| • S“ 1 - [i-Mj} (2) 
is the multivariate Gaussian with mean y and covariance ma 
trix E¿. Clearly for the mixing coefficients u>i we must have: 
= 1. How many Gaussians should be used to model the 
i 
observations, how to adapt the Gaussian efficiently over time and 
which Gaussians should be considered background, are questions 
that arrive immediately. Most GMM based methods are based on 
very simple assumptions for efficiency. They used a fixed num 
ber of Gaussians per pixel and the minimum number of Gaus 
sians with weights which sum up to a given threshold are treated 
as background. 
The adaptation of the Gaussians over time is a bit more compli 
cated. Instead of using the EM-algorithm or similar methods to 
ô fe’Mi) 2 < T near ■ Tr y Xc x ) 
= Tnear (1 + 2 Ac + A 2 + A a ) (5) 
where X z G {0,1} depending on whether current and previ 
ous depth measurements are both valid. The mechanism from 
(Harville et al., 2001) works well to that end. Similarly, A c G 
{0,1} indicates whether the chromaticity channels of the current 
observation as well as the recorded information provide trustwor 
thy values. This can be estimated simply by checking if both 
luminance values or their means respectively are above a certain 
threshold. Finally, À a G {0,1} determines if the amplitude mod 
ulation should be used for the classification and it is specified a 
priori. 
This matching function pays respect to the fact that observations 
in the color, depth and amplitude modulation dimensions are in 
practice not independent. A foreground object has most likely not 
(2tt) 
d i m ( * ) 
det(Ei)
1
2
...
73
74
75
76
77
...
324
325
Full text: Papers accepted on the basis of peer-reviewed full manuscripts (Part A)

Access restriction

Copyright

Note to user