664
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part Bib. Beijing 2008
Two framework prototypes were developed, one for lightweight
compact cameras and one for heavier single-lens reflex cameras.
Establishing temporal correspondences of the segmented
objects is an essential task for object tracking, especially for
scenes with heavy occlusions. One efficient approach is the
application of a Kalman-filter (Zhao & Nevatia, 2004;
Rosales & Sclaroff, 1998). Weaknesses of this approach appear
in the case of abrupt variation in speed or moving direction.
Improvements can be achieved by integration of microscopic
motion models for the objects, that are adapted to typical events
or positions in object space (Antonini et al., 2006).
Automatic extraction of additional attributes of humans, such as
gender and age, is a difficult pattern recognition problem, since
dressed persons have a big variety in shape and dress style, the
facial expression of individuals may change significantly
depending on mood, and variations in lighting conditions may
occur. So far this task was addressed only in few approaches.
In general these methods rely on high-resolution images of
faces or are analysing the gait of persons. The opportunity for
classifying the pedestrians by analysis of their gait is not given
in this case, because it requires continuous observation of the
whole body. In complex scenes this is not realizable.
Good results for gender classification of faces are reported by
Baluja & Rowley (2007). They are using AdaBoost and achieve
correctness over 90%. Lanitis et al. (2004) determine the age of
person with a variance of about 5 years. Both approaches
underlie the constraints of a close defined viewing angle,
uniform lighting conditions, and the absence of occlusions.
These constraints are not met in this project, because we have to
deal with wide and complex scenes. Individual pedestrians may
wear different kinds of clothing, can appear with arbitrary
orientation with respect to the camera, and differ in size and
shape. An automatic extraction of the features age and gender is
therefore hardly possible. Instead, a semi-automatic approach is
developed in which such decisions are made by an operator.
2. SYSTEM SET-UP
As described above, the system consists of two cameras. To
ensure, that both cameras cover the same area and have a
similar viewing angle, they were placed close to each other. The
first one, the observation camera, is a static video camera with
fixed interior and exterior orientation. For our initial
experiments we used a standard webcam with a resolution of
640*480 pixels. The second camera is a single-frame PTZ-
camera of higher geometric resolution. Due to the fact, that
common PTZ-cameras are video cameras with pal-resolution,
we developed own prototypes. These prototypes allow the use
of standard high-resolution single-frame cameras. The
frameworks of these prototypes were realized with Lego NXT.
The employment of Lego NXT for this task has two major
advantages: The framework can be adjusted on demand, so the
projection centre of the cameras was positioned into the origin
of the rotation axis. Furthermore, the control of motors for the
rotations can be realized in C++ Code.
Figure 1. Prototype for lightweight compact cameras.
3. WORKFLOW
The complete scene of interest is observed by the video camera.
Pedestrians appearing in the scene are detected and tracked in
the video stream. The positions of the individuals are
transformed from image to object space using projective
transformation. Based on the positions of the pedestrians in
object space the orientation parameters of the PTZ-camera are
computed and high resolution images of the tracked persons are
acquired. These high-resolution images are classified
interactively. In the following the major steps are described in
detail.
3.1 Video stream analysis
This project requires real-time analysis of the video stream for
pedestrian detection and tracking. We used the free available
OpenCV-library, which is implemented in C and C++ Code.
The library offers a broad range of computer vision functions
and allows an easy link to our PTZ-camera prototypes. For the
graphical user interface and to enable a continuous observation
of the scene in the video camera while acquiring images with
the PTZ-camera we used the Qt-library.
Background subtraction
In order to reduce the false positive rate, which usually is a
problem in different approaches for people detection in
complex scenes, background subtraction was chosen as first
step. In background subtraction the static background is
separated from all moving objects, which are segmented as
foreground. In the following only foreground pixels, which
contain the regions of interest for pedestrians, are examined.