In: Paparoditis N., Pieirot-Deseilligny M.. Mallet C.. Tournaire O. (Eds). 1APRS. Vol. XXXVIII. Part ЗА - Saint-Mandé, France. September 1-3. 2010
PEOPLE TRACKING AND TRAJECTORY INTERPRETATION
IN AERIAL IMAGE SEQUENCES
F. Burkert 3 '*, F. Schmidt b , M. Butenuth a , S. Hinz b
a Technische Universität München, Remote Sensing Technology, 80333 München, Germany
(florian.burkert, matthias.butenuth)@bv.tum.de
b Karlsruher Institut fur Technologie, Institut für Photogrammetrie und Fernerkundung, 76131 Karlsruhe, Germany
(florian.Schmidt, stefan.hinz)@kit.edu
Commission III, WG II1/5
KEY WORDS: People tracking, people trajectories, event detection, aerial image sequences
ABSTRACT:
Monitoring the behavior of people in complex environments has gained much attention over the past years. Most of the current
approaches rely on video cameras mounted on buildings or pylons and people are detected and tracked in these video streams. The
presented approach is intended to complement this work. The monitoring of people is based on aerial image sequences derived with
camera systems mounted on aircrafts, helicopters or airships. This imagery is characterized by a very large coverage providing the
opportunity to analyze the distribution of people over a large field of view. The approach shows first results on automatic detection
and tracking of people from image sequences. In addition, the derived trajectories of the people are automatically interpreted to
reason about the behavior and to detect exceptional events.
1. INTRODUCTION
Monitoring the behavior of people in crowded scenes and in
complex environments has gained much attention over the past
years. The increasing number of big events like conceits,
festivals, sport events and religious meetings as the pope’s visit
leads to a growing interest in monitoring crowded areas. In this
paper, a new approach for detecting and tracking people from
aerial image sequences is presented. In addition to delineating
motion trajectories, the behavior of the people is interpreted to
detect exceptional events such as panic situations or brawls.
A typical feature of current approaches is the utilization of
video cameras mounted on buildings to detect and track people
in video streams. Pioneering work on tracking human
individuals in terrestrial image sequences can be found, e.g., in
(Rohr, 1994; Moeslund & Granuni, 2001). While this work
focuses on motion capture of an isolated human, first attempts
to analyze more crowded scenes are described in (Rosales &
Scarloff, 1999; McKenna et al. 2000). Such relatively early
tracking systems have been extended by approaches integrating
the interaction of 3D geometry, 3D trajectories or even
intentional behavior between individuals (Zhao & Nevada,
2004; Yu & Wu, 2004; Nillius et al., 2006; Zhao et al., 2008).
Advanced approaches, based on so-called sensor networks, are
able to hand-over tracked objects to adjacent cameras in case
they leave the current field of view achieving a quite
comprehensive analysis on the monitored scene. The work of
(Kang et al.. 2003) exemplifies this kind of approaches. Instead
of networks of cameras, moving platforms like unmanned
airborne vehicles (UAVs) can be utilized, too, as e.g. presented
in (Davis et al., 2000). An overview on the research of crowd
modeling and analysis including all stages of a visual
surveillance is given in (Hu et al., 2004; Zhan et al., 2008).
An important aspect of tracking a large number of people, as
e.g. shown in (Rodriguez et al., 2009), includes the potential to
not only analyze individual trajectories but also to learn typical
interactions between trajectories (Scovanner & Tappen, 2009).
Hence, event detection has been an intensely investigated field
of research in the last decade. A framework using two modular
blocks to detect and analyze events in airborne video streams is
presented in the work of (Medioni et al., 2001). The first
module detects and tracks moving objects in a video stream,
whereas the second module employs the derived trajectories to
recognize predefined scenarios. A further event recognition
system is based on two consecutive modules, namely a tracking
and an event analysis step, in which complex events are
recognized using Bayesian and logical methods (Hongeng et al.,
2004). Video streams from close range surveillance cameras are
used to detect events focusing on interactions between few
persons. Further methods exemplify the emphasis on research in
surveillance issues, as the scanning of video streams for unusual
events (Breitenstein et al., 2009; Mehran et al., 2009).
Additional related work in the field of people tracking and
event detection is based on seminal research in crowd analysis
and simulation (Helbing and Molnar, 1995; Helbing et al.,
2002). Observed collective phenomena in moving crowds, like
lane formations in corridors, have successfully been simulated
using a social force model (SFM). The SFM considers
interactions among pedestrians and between pedestrians and
obstacles, resulting in a certain moving direction for each
individual.
The approach presented in this paper is aimed to complement
the above work. The monitoring of people is based on aerial
camera systems mounted on aircrafts, UAVs, helicopters or
airships. The provided image sequences cover a large area of
view allowing for the analysis of density, distribution and
motion behavior of people. Yet, as the frame rate of such image
sequences is usually much lower compared to video streams
(only some Hz), more sophisticated tracking approaches need to
be employed. Moreover, the interpretation of scenarios in such
large scale image sequences needs to comprise an exceeding
number of moving objects compared to existing event detection
systems. Thus, the intention of the approach is to define a
broader spectrum of identifiable scenarios instead of simply
alerting a general abnormal event within a monitored area.