o
N
E]
e
N
e
=
Classification Error
eo
T
ec
e
o
o à
NO.
FB
Fr 8
3 4 5 6 7 8 9 10 — 414 12 13
# Samples
Figure 2: Average classification error as function of the number
of training samples.
statistics in the leaves must be updated on the new, or without the
missing class, respectively, which also alters the statistics consid-
ered for splitting so far. However, this step does not affect the re-
altime capability when processed parallely. For not drifting away
from previously learnt knowledge, a series of recent samples for
each object class is kept in memory. We observed convergence
of the misclassification error over the number of samples used
for training, as plotted in Figure 2. After the classifier has seen
ten samples, the misclassification rate does not shrink consider-
ably further. We hence set the number of samples to be stored in
memory for each object class to ten. If the number of available
samples exceeds this number, the oldest samples are discarded.
4 DATA ASSOCIATION
The association probabilities between targets and detections are
assessed by evaluating the goodness of fit with respect to a motion-
and appearance model. The motion is modelled by a linear Kalman
Filter. The similarity of appearance is expressed by the response
of the classifier that we introduced in the previous section.
4.1 Object Detection and Localisation
The sliding-window-based approach of Dalal and Triggs (2005)
turned out to be the most adequate choice out of the state-of-
the-art detectors, as shown in Dollar et al. (2011). For detection
we use the HoG/SVM framework and classify Histograms of ori-
ented Gradients with a Support Vector Machine as either pedes-
trian or non-pedestrian. Additionally we apply background sub-
traction, which is not a nessessary procedure for our tracking ap-
proach, but helps excluding very unlikely detections from track-
ing. We use background modelling based on Mixtures of Gaus-
sians (Stauffer and Grimson, 1999) for discovering misplaced de-
tections, i.e. a detection is only accepted if it has a sufficiently
large overlap with a foreground region. The detections are pro-
jected onto a reference plane using a planar homography that
can be calculated using known controlpoints visible in the scene.
Since the bottom line of a detected region is prone to localisation
uncertainties due to occlusions and articulations of legs, the top-
most central point of the detection is used for projection under
assumption of a default height of a German adult of 1.72m!. The
state of the target is modeled by its location and velocity on the
ground in 3D coordinates of the reference frame and its appear-
ance as learnt by the classifier.
4.2 Detection-to- Track Assignment
Assignments of observed detections to trajectories are established
in a probabilistic way. We follow Schindler et al. (2010) and com-
bine probabilities that result from analysing motion and appear-
ance. The target's state is estimated using a Kalman Filter and the
! Surveyed by Statistisches Bundesamt 2009 (www.destatis.de)
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
distance between the prediction and the location of a detection
is regarded for assessing the likelihood of correspondence. The
output of the classifier is regarded as a measure of similarity with
respect to the appearance. The motion model allows to exclude
very unlikely detections from association under the assumption
of constant velocity. The classifier supports the association espe-
cially where targets dissolve from mutual occlusions.
Object Motion The probability of a detection o; being triggered
by the target T' with respect to the target's motion Mr is assessed
by evaluating the distance between the predicted target location
&; and the one of the detection x; on the ground plane. For pre-
diction we model the object's motion using a linear Kalman Filter
with constant velocity assumption. The probability of the ob-
served position with respect to the motion model is formulated
as
- 1172,01?
p(o|Mr)—-e 27 (3)
Object Classification The probability of the detection being trig-
gered by the target T' with respect to the classifier response on
the sample s; of the detection is evaluated by applying the ORF
as explained in section 3. Each detection is evaluated with the
classifier and assigned with confidence values for each tracked
object. The probability of a detection belonging to the tracked
object given the sample of that detection is given by
p(oi|Cr) = p(k — T|si) (4)
We model the probability of a detection for its assignment to a
trajectory as the combined probability
p(oi|T) — p(oi|Cr) - p(oi|Mr) (5)
For each present target only the detection with the highest com-
bined probability is chosen for updating, given that the combined
probability exceeds a threshold that derives from the total number
of classes in the Random Forest. After successful association, the
sample derived from the associated detection is used for updating
the ORF and to complement the set of samples according to the
matched trajectory; the state of the tracking object is updated with
the new measurements z;, or with the predicted state 2;, other-
wise. In order to account for rising uncertainty in prediction with
the time from the latest update, the standard deviation c in eq. 3
is set in dependency of the number of missing associations.
4.3 Initialisation and Termination
The calculation of the probabilities for the detection-to-track as-
signment is carried out for each active trajectory and every cur-
rent detection. Each detection that has not been associated to a
present trajectory by the association strategy is used to initialise
a new trajectory. For training of the instance specific classifier,
a set of samples is generated from the detection as explained in
section 3, followed by a re-training of the ORF with the samples
stored so far. The motion is initialised based on the location of
the detected object on the ground plane. Trajectories that have
not been updated with a detection for more than a preset number
of frames are terminated. We set the number of frames to wait for
an update in our experiments to 10.
5 RESULTS
In this section results on the performance of the classifier and the
data association strategy are presented.
—
x
ek
ue
f
O
oO
0
[7]
S
Oo
Fig
one
In]
hal
ava
OR
in]
fra
for
of
to
ge
the
pe:
wh
Th
the
Th
lys
on
ag
the