XXII ISPRS Congress 2012: Technical Commission III

  
    
   
    
   
  
     
   
  
   
    
  
  
   
    
    
    
   
   
   
    
  
     
     
   
   
  
   
   
     
    
    
  
   
      
   
   
    
   
Then we have to handle two problems remained. Firstly, 
although system model deals with three-dimensional ellipsoid, 
pedestrian behavior model deals with behavior on two- 
dimension plane. Therefore we assume that ellipsoid is upright 
on the floor and set the coordinates parallel to the floor (ground 
coordinates). At the same time, we calculate the angle between 
camera coordinates and the ground coordinates. Secondly, 
behavior model assumes that destinations of every pedestrian 
are known in advance. However, in case of on-line tracking, we 
cannot know their destination in advance. Therefore, we omit 
the term about destination in this model, the term (b) in 
equation (4). After this step, all we have to do is to set initial 
position, shape and velocity for all people to be tracked. 
4.4 Observation Model 
We also model an observation model for filtering step. 
Observation model is a probability distribution of z, on x, 
modeled by tracking method. We make both color and range 
model stochastically. The model is in a form of a product of 
color observation model and range observation model as 
follows: 
p(zix,) = Pcotor(Zi|X,) Prange(Zi|X,) (6) 
4.4.1 Color Observation Model: Peolor(Z4X,) is a probability 
distribution according to the similarity between color 
histograms of pixels in the ellipsoid at time 7-1 and 7. We use 
Bhattacharyya coefficient B as follows, a coefficient correlation 
of color histogram as used in existing works (e.g. Wu and 
Nevatia (2007) and Ali and Dailey (2009)). 
B = 24.4, Lun (7) 
where m = pixel value 
d; = normalized histogram at time t 
d, ,, — relative frequency of pixel value m in histogram d 
We calculate this for each color r, 2 and b, and define 
Peolor(Z{|X;) as a product of them. 
4.4. Range Observation Model: Pans X) is a 
probability distribution according to the similarity between 
shape of predicted ellipsoid and observed object in actuality. 
For pixel P included in the ellipse made by projection of 
predicted ellipsoid to the obtained image, let d(P) the distance 
from observed coordinates P(X, Y, Z) to the center of ellipsoid 
O. Let P' the point that half line from O to P intersects the 
ellipsoid, and d(P) the distance from O to P'. Here, we 
describe P,ange(Z;[X;) as follows: 
Prange (2, | X,) =1 [x(ao- «ey | (8) 
if | d(P)-d(P)|>1 , then| d(P)-d(P)|=1 
where / = number of pixel P in total 
5. APPLICATION 
5.1 Observation Conditions and Parameter Settings 
We apply the proposed method to the data acquired at the ticket 
gate of Tama-Plaza station, the railway station in the popular 
residential area about 20km west from central Tokyo. We took 
   
the data in the morning, the commuter rush hour and confirmed 
that people behavior was under the complex situations. The 
stereo video camera used in this observation is consisted of two 
cameras (SONY-DFW, 1.2 million pixels), set about one meter 
spaced, calibrated in advance. Frame rate is set at. 7.5 
[frames/sec] from the constraints of the stereo synchronization 
process. In this condition, the video was taken from a point 
about 10m height, looking down obliquely (figure 5). 
  
  
  
Platform #2 
    
Platform #1 : 
   
North Exit South Exit 
os J 
Figure 5. Example of obtained image 
In the proposed method, we need to set some initial values and 
parameters in advance. We set the number of particles as N=500, 
For the state vector, we get the initial position of people 
manually and set as the position (x, y, z). The size of the 
ellipsoid is set to w=0.4[m], h=1.6[m] and d-0.3[m] 
considering the size of people. We also set the initial velocity of 
each person manually. For the variance of system model, we set 
«= (10, 5, 10, 0.05, 005, 0.05) [em] after some trials. Finally 
we calculate the angle between camera and ground coordinate 
as o=0.62[rad]. 
5.2 Results and Discussions 
We apply this method for 30 seconds (226 frames). During this 
period 51 people with 3,384 frames in total are to be tracked. 
We make a performance verification of the proposed method by 
comparison of the position of the person obtained from tracking 
result with manually read from the image. As a result, we 
succeeded in 2,626 frames (78%) in total and 40 people of 44 
are correctly tracked to the ticket gate (table 6). 
Table 6. Tracking result with comparison by system model 
  
  
  
  
  
  
Success # of 
Svst del # of success | Success person 
ten mode frame rate tracked to the 
ticket gate 
Proposed 2626 78% 40 / 44 
Noise only 1808 53% 35/44 
With destination 2238 66% 28/44 ps 
  
  
  
  
Figure 7 shows a part of the results. Points in the image show 
the center of obtained ellipsoid by tracking. The numbers 
associated with points on the image is a unique number given to 
each ellipsoid, which is corresponding to the tracked person. 
  
  
prox 
Succ 
invo 
beha 
stays 
of re 
cond 
colli 
Alth 
show 
rema 
the i 
accu 
In a 
and 
mod 
53 
To v 
Syste 
expe 
equa 
term 
case 
peor 
inte 
is m
1
2
...
411
412
413
414
415
...
586
587
Full text: Technical Commission III (B3)

Access restriction

Copyright

Note to user