663
REAL-TIME ORIENTATION OF A PTZ-CAMERA BASED ON PEDESTRIAN
DETECTION IN VIDEO DATA OF WIDE AND COMPLEX SCENES
T. Hoedl *, D. Brandt, U. Soergel, M. Wiggenhagen
IPI, Institute of Photogrammetry and Geoinformation, Leibniz Universitaet Hannover, Germany
- (hoedl, soergel, wiggenhagen)@ipi.uni-hannover.de
Intercommission Working Group III/V
KEYWORDS: Computer Vision, Detection, Close Range Photogrammetry, Absolute Orientation, Urban Planning, Tracking,
Multisensor, Real-time
ABSTRACT:
Object detection and tracking is the basis for many applications in surveillance and activity recognition. Unfortunately the utilized
cameras for the observation of wide scenes are mostly not sufficient for detailed information about the observed objects. We present
a two-camera-system for pedestrian detection in wide and complex scenes with the opportunity to achieve detailed information
about the detected individuals. The first sensor is a static video camera with fixed interior and exterior orientation, which observes
the complete scene. Pedestrian detection and tracking is realized in the video stream of this camera. The second component is a
single-frame PTZ (pan / tilt / zoom) camera of higher geometric resolution, which enables detailed views of objects of interest
within the complete scene. For this reason the orientation of the PTZ-camera has to be adjusted to the position of a detected
pedestrian in real-time in order to caption a high-resolution image of the person. This image is stored along with time and position
stamps. In post-processing the pedestrian can be interactively classified by a human operator. Because the operator is only
confronted with high-resolution images of stand-alone persons, this classification is very reliable, economic and user friendly.
1. INTRODUCTION
1.1 Motivation
The work presented here is embedded in the framework of an
interdisciplinary research project aiming at the assessment of
the quality of shop-locations in inner cities. In this context the
number, the behaviour (e.g., walking speed and staying periods),
and the kind (e.g., in terms of gender and age) of pedestrians
passing by are crucial issues. In general there are different
options for achieving the desired information. On the one hand
sensors are feasible, which are carried by persons and which
deliver their position based on existing infrastructure (mobile
phones, GPS, RFID, Bluetooth). On the other hand such
information can be derived entirely from observations from
outside (cameras). To be independent from active cooperation
of the individuals and to be able to collect information about all
pedestrians, cameras were chosen as the appropriate sensors for
this project. The task requires both the surveillance of a large
and complex scene and at the same time the need to gather
high-resolution data of individuals, which can hardly be
fulfilled by a single camera system. Hence, a two-camera-
system set-up is used in this approach.
The first one, the observation camera, is a static video camera
with fixed interior and exterior orientation. Pedestrian detection
and tracking must occur in real time in the video stream of this
camera. The positions of the detected individuals in object
space are passed to the second camera. This camera is a PTZ
(pan / tilt / zoom) camera of higher geometric resolution, which
enables to focus on objects of interest within the complete scene.
Hence a detailed analysis of the individuals is possible.
1.2 Related Work
Due to the broad range of applications (surveillance, activity
recognition or human-computer-interaction) human motion
analysis in video sequences has become one of the most active
fields in computer vision in recent years. Latest surveys of the
numerous publications were issued by Moeslund et al. (2006)
and Yilmaz et al. (2006).
One focus of research is automatic detection and tracking of
humans in uncontrolled outdoor environments. The tracking of
articulated objects such as human bodies is much more complex,
than the tracking of solid objects, as e.g., cars, due to the fact
that the relation of the limbs changes by time. Nevertheless
these approaches show already promising results, especially for
simple scenes populated by only a few individuals.
The initial step in many approaches is background subtraction.
For many years background subtraction was only used for
controlled indoor environments, but with the adaptive Mixture
of Gaussian (MoG) method by Stauffer & Grimson (1999) it
also became a standard for outdoor environments. Recent
advances in background subtraction, which are mostly based on
the MoG-Algorithm, deal with minimizing false positives or
negatives, for example due to shadows, or background updating.
Moeslund et al. (2006) categorize approaches for object
detection based on the segmentation methods: motion,
appearance, shape and depth-based. Any use of just one of these
methods is only successful to a certain point in complex scenes.
For this reason newer approaches combine several segmentation
methods, e.g., Viola et al. (2005) combine motion and
* Corresponding author