Full text: Proceedings (Part B3b-2)

664 
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part Bib. Beijing 2008 
Two framework prototypes were developed, one for lightweight 
compact cameras and one for heavier single-lens reflex cameras. 
Establishing temporal correspondences of the segmented 
objects is an essential task for object tracking, especially for 
scenes with heavy occlusions. One efficient approach is the 
application of a Kalman-filter (Zhao & Nevatia, 2004; 
Rosales & Sclaroff, 1998). Weaknesses of this approach appear 
in the case of abrupt variation in speed or moving direction. 
Improvements can be achieved by integration of microscopic 
motion models for the objects, that are adapted to typical events 
or positions in object space (Antonini et al., 2006). 
Automatic extraction of additional attributes of humans, such as 
gender and age, is a difficult pattern recognition problem, since 
dressed persons have a big variety in shape and dress style, the 
facial expression of individuals may change significantly 
depending on mood, and variations in lighting conditions may 
occur. So far this task was addressed only in few approaches. 
In general these methods rely on high-resolution images of 
faces or are analysing the gait of persons. The opportunity for 
classifying the pedestrians by analysis of their gait is not given 
in this case, because it requires continuous observation of the 
whole body. In complex scenes this is not realizable. 
Good results for gender classification of faces are reported by 
Baluja & Rowley (2007). They are using AdaBoost and achieve 
correctness over 90%. Lanitis et al. (2004) determine the age of 
person with a variance of about 5 years. Both approaches 
underlie the constraints of a close defined viewing angle, 
uniform lighting conditions, and the absence of occlusions. 
These constraints are not met in this project, because we have to 
deal with wide and complex scenes. Individual pedestrians may 
wear different kinds of clothing, can appear with arbitrary 
orientation with respect to the camera, and differ in size and 
shape. An automatic extraction of the features age and gender is 
therefore hardly possible. Instead, a semi-automatic approach is 
developed in which such decisions are made by an operator. 
2. SYSTEM SET-UP 
As described above, the system consists of two cameras. To 
ensure, that both cameras cover the same area and have a 
similar viewing angle, they were placed close to each other. The 
first one, the observation camera, is a static video camera with 
fixed interior and exterior orientation. For our initial 
experiments we used a standard webcam with a resolution of 
640*480 pixels. The second camera is a single-frame PTZ- 
camera of higher geometric resolution. Due to the fact, that 
common PTZ-cameras are video cameras with pal-resolution, 
we developed own prototypes. These prototypes allow the use 
of standard high-resolution single-frame cameras. The 
frameworks of these prototypes were realized with Lego NXT. 
The employment of Lego NXT for this task has two major 
advantages: The framework can be adjusted on demand, so the 
projection centre of the cameras was positioned into the origin 
of the rotation axis. Furthermore, the control of motors for the 
rotations can be realized in C++ Code. 
Figure 1. Prototype for lightweight compact cameras. 
3. WORKFLOW 
The complete scene of interest is observed by the video camera. 
Pedestrians appearing in the scene are detected and tracked in 
the video stream. The positions of the individuals are 
transformed from image to object space using projective 
transformation. Based on the positions of the pedestrians in 
object space the orientation parameters of the PTZ-camera are 
computed and high resolution images of the tracked persons are 
acquired. These high-resolution images are classified 
interactively. In the following the major steps are described in 
detail. 
3.1 Video stream analysis 
This project requires real-time analysis of the video stream for 
pedestrian detection and tracking. We used the free available 
OpenCV-library, which is implemented in C and C++ Code. 
The library offers a broad range of computer vision functions 
and allows an easy link to our PTZ-camera prototypes. For the 
graphical user interface and to enable a continuous observation 
of the scene in the video camera while acquiring images with 
the PTZ-camera we used the Qt-library. 
Background subtraction 
In order to reduce the false positive rate, which usually is a 
problem in different approaches for people detection in 
complex scenes, background subtraction was chosen as first 
step. In background subtraction the static background is 
separated from all moving objects, which are segmented as 
foreground. In the following only foreground pixels, which 
contain the regions of interest for pedestrians, are examined.
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.