Full text: Proceedings, XXth congress (Part 5)

   
ul 2004 
  
'eometric 
een from 
>d on the 
  
'€ approx- 
curves. 
   
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B5. Istanbul 2004 
  
the difficulties of the task are the changes that can appear in the 
background image, such as changes in illumination, shadows and 
other changes in the scenery. The topic has received great atten- 
tion in the past as well as in the present. Toyama et al. (1999) 
list ten different algorithms to solve most of the common prob- 
lems of background estimation. Among those are simple adjacent 
frame differencing, pixel-wise mean values and fixed threshold- 
ing, pixel-wise mean and covariance values and thresholding and 
several advanced and combined methods. 
In the beginning of our experiments we have used a simple test se- 
quence which was used by Prati et al. (2003) for their shadow de- 
tection algorithms and which they have provided to the research 
community. We tested two algorithms using a varying number 
of images from the test sequence. The first algorithm simply 
computes the average of the frames. The second algorithm is a 
state-of-the-art background estimator from a commercial image- 
processing library. Both algorithms were tested on a set of 4 and 
a set of 32 images from the sequence. Selected frames and the 
results are shown in figure 3. 
The results reveal a problem of classic background estimation: 
the algorithms adapt slowly when no proper initialization is avail- 
able. When the sequence contains only four frames clear ghosting 
effects are visible for both the simple averaging and the commer- 
cial implementation. When the length of the sequence is extended 
the results are better, but still both algorithms produce artifacts in 
the background image. Usually the number of frames needed for 
proper initialization is above 100. 
Since we wish to keep the number of images and thereby the extra 
effort of manual image acquisition to a minimum, this behavior 
is not acceptable. Therefore we propose a different mechanism 
of computing the background image from a small set of images. 
While most background estimation processes only consider one 
image at a time, a method suitable for processing a continuous 
stream of images, our method processes the full set of images 
at a time iterating over each pixel. We can think of the image 
sequence as a stack of per-pixel registered images, where each 
pixel exactly corresponds to the pixel of a different image at the 
same location. Thus we can create a pixel stack for each pixel 
location. 
Using the basic assumption of background estimation that the 
background dominates over the altering foreground objects, we 
have to identify the subset of pixels within each pixel stack, which 
resembles the background. In other word we have to identify the 
set of pixels, which form the best consensus within each pixel 
stack. Projecting the pixels into a certain color space, we can 
think of the problem as a clustering task, where we have to find 
the largest cluster of pixels and disregard outliers. Figure 4 shows 
the pixels taken from the same location in four images of the test 
sequence of figure 3. The pixels are displayed in the two dimen- 
sions (red/green) of the three-dimensional RGB color space. The 
diagram shows three pixels in the upper right corner, which form 
a cluster and an outlier in the lower left corner. 
Any unsupervised clustering technique could be employed in or- 
der to solve this task. Our approach is inspired by the RANSAC 
method introduced by Fischler and Bolles (1981). However, since 
the number of samples is small we do not have to randomly se- 
lect candidates, but we can rather test all available samples. For 
every sample we test how many of the other samples are compat- 
ible. Compatibility is tested using a distance function within the 
selected color space. Either Mahalanobis distance or Euclidean 
distance can be used to compute the distance. If the distance is 
below a fixed threshold, the samples are determined to be com- 
patible. The tested sample receives a vote for each compatibility. 
869 
LLL. Lal 
  
  
80 100 120 140 160 180 200 
Figure 4: A sequence of images is treated as a stack of images. 
The graphic on the left shows a stack of four images. A single 
pixel is observed along a ray passing through all images at exactly 
the same position. 
The diagram to the right shows the pixel values in red/green color 
space. Three pixels agree while one is clearly an outlier. 
  
  
Figure 5: The result of the proposed algorithm on the test se- 
quence. The left image shows the result obtained from computa- 
tion in RGB color space, while the right image shows the result 
from HSV color space. In both cases the persons walking before 
the background were completely suppressed from only four im- 
ages. The computation in HSV color space gives slightly better 
results. 
The sample with the largest number of votes is selected as the 
background pixel. 
For the implementation we had to chose the appropriate distance 
function and select a proper color space. While some researches 
favor RGB Color space, others have reported good results from 
CIE color space (Coorg and Teller, 1999). A comparison of the 
results of the proposed algorithm from computation in RGB and 
HSV color space is given in figure 5. Simple Euclidean distance 
performed sufficiently in our test. For this test four input images 
were used. In both cases the persons walking before the back- 
ground were completely suppressed. 
Figure 6 shows a sequence of four images from a fixed viewpoint. 
The object depicted is the facade of a house imaged from across 
a busy street. Traffic and pedestrians partially occlude the lower 
portion of the facade. In figure 7 the results of the background 
estimation process are shown. The algorithm is able to remove 
all occlusions of moving objects from the images. Figure 7 (b) is 
an image that was computed by combining those samples, which 
received the least number of votes during clustering. The image 
thereby combines all outlier pixels and thus contains a combina- 
tion of all occlusions. It serves as a comparison to image 7 (a) 
to assess the quality of the removal. In image 7 (c) all pixels are 
marked in white for which no unique cluster could be determined, 
i.e. all samples received only one vote or two clusters received the 
same number of votes. In these cases the pixel of the first frame 
was chosen to represent the background. 
4 MULTI-VIEW FUSION 
In section 3 we have shown how moving objects occluding a 
facade can be removed using several images taken from a fixed 
viewpoint. A static object occluding a facade will be imaged at 
   
    
   
    
   
   
    
  
    
   
    
  
  
   
   
   
   
    
   
  
   
    
   
   
    
   
   
   
    
    
   
   
   
   
   
   
   
   
   
   
    
   
   
    
    
   
    
    
 
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.