ABSTRACT:
1 INTRODUCTION
In recent years pedestrian tracking has been used successfully in
time-critical applications such as self-organising geosensor net-
works, for driver assistance and human-machine-interaction. In
such applications, where tracking results support autonomous op-
eration of a system, e.g. in Jaenen et al. (2012), speed and the
robustness of the data association strategy for linking detections
to targets, is of crucial importance. Traditionally, data associa-
tion is based on geometry and appearance based similarity cues.
When a new object enters a scene, the observations are rare from
the scratch but usually accumulate over time. Due to the varying
appearance of the detected objects under egomotion or changing
camera orientation, an adaptive representation of the target’s ap-
pearance is advantageous.
We apply Tracking-by-Detection and focus on the association
problem. The strategy for association is twofold. A motion model
predicts the state of the target in the upcoming frame and gates
the association. An appearance model in terms of a classifier
is learnt for each target which calculates the probability of each
detection being triggered by the target. Related work on pedes-
trian tracking has presented promising results when using such
instance specific classifiers. These usually require to be built in-
crementally, to adapt new information and to eventually discard
old one. We therefore employ a variant of Randomized Trees
(Amit and Geman, 1997) that has been introduced towards online
learning (Saffari et al., 2009). Ensembles of Randomized Trees,
referred to as Random Forests by (Breiman, 2001) construct po-
tentially strong classifiers by aggregating simple decision trees.
Due to their modular setup they suit well for online applications.
Splits can be introduced when new samples arrive, which allows
for incremental learning and entire trees may be discarded, which
supports adaptation. The aggregation of single trees allows par-
allel processing of the Random Forest (Sharp, 2008), which sup-
ports the real-time capability. Furthermore, Random Forests are
inherently useful for multiclass problems, which allows classify-
ing a varying number of object classes with a single classifier.
392
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012
XXII ISPRS Congress, 25 August - 01 September 2012, Melbourne, Australia
PERSISTENT OBJECT TRACKING WITH RANDOMIZED FORESTS
Tobias Klinger and Daniel Muhle
Leibniz Universitaet Hannover
Institute of Photogrammetry and GeoInformation
Nienburger Strasse 1, 30167 Hannover, Germany
klinger @ipi.uni-hannover.de, muhle @ipi.uni-hannover.de
http://www.ipi.uni-hannover.de/
Commission III/5
KEY WORDS: Learning, Detection, Decision Support, Tracking, Real-time, Video
Our work addresses the problem of long-term visual people tracking in complex environments. Tracking a varying number of objects
entails the problem of associating detected objects to tracked targets. To overcome the data association problem, we apply a Tracking-
by-Detection strategy that uses Randomized Forests as a classifier together with a Kalman filter. Randomized Forests build a strong
classifier for multi-class problems through aggregating simple decision trees. Due to their modular setup, Randomized Forests can
be built incrementally, which makes them useful for unsupervised learning of object features in real-time. New training samples can
be incorporated on the fly, while not drifting away from previously learnt features. To support further analysis of the automatically
generated trajectories, we annotate them with quality metrics based on the association confidence. To build the metrics we analyse
the confidence values that derive from the Randomized Forests and the similarity of detected and tracked objects. We evaluate the
performance of the overall approach with respect to available reference data of people crossing the scene.
2 RELATED WORK
Tracking multiple objects always entails the problem of estab-
lishing correspondences between a tracked object and unassoci-
ated detections through the spatio-temporal domain. Common
techniques for solving the association problem include the near-
est neighbour search between the target representation and a set
of measurements in state space. Typical state representations in-
clude the object position and temporal derivatives in 2D image
and 3D world coordinates, colour- and edge-based information.
Using only dynamic information does not allow unainbiguous as-
sociation when targets appear in self-occluding crowds. In com-
plex scenarios with the demand of re-identification of a target
after occlusions or missing detections, appearance models are
commonly used to support association. Comaniciu et al. (2003)
used histogram based target representations which was adopted
by many others. McKenna et al. (1999) incorporated adaptiv-
ity using Gaussian Mixture Models to counteract the impact of
changing target appearance through changes in illumination and
camera orientation. Histogram based representations are, how-
ever, still prone to wrong associations, since the geometric re-
lationships of pixels are disregarded completely. More recent
work involves classification for recognition. Avidan (2005) and
Grabner and Bischof (2006) classify objects using classifier learnt
by boosting for distinguishing objects of interest from the back-
ground. Breitenstein et al. (2011) built upon that strategy for mul-
tiple target tracking scenarios and introduced instance specific
classifiers by learning a boosted classifier for each individual tar-
get. Target representations are learnt on-line and evaluated on the
detection windows. It is shown that the adaptive learning yields
improvements in the detector confidence over time. However,
classification remains a binary problem where individual classi-
fiers are learnt for each target. Another technique for building
strong classifiers out of simple decision stumps is the aggrega-
tion of decision trees, referred to as Random Forests. Variants of
Ramdom Forests have already been applied in time-critical ap-
plications such as keypoint recognition (Lepetit and Fua, 2006),
SLA
fari
Fore
the :
ble
boo:
leari
othe
asso
The
sific
atio!
and
For
els
of I
spli
izec
Ran
spei
34
Rar
of 1
dat:
con
and
san
one
ran
If a
ref
For
of |
use
cor
tha
onl
bro
SO |
as
for
is |
Sur
CI:
av:
Ac
is |
ric
me
tra
on
an