(XIX-B3, 2012
cation of a detection
correspondence. The
ure of similarity with
lel allows to exclude
inder the assumption
the association espe-
clusions.
on o; being triggered
otion Mr is assessed
licted target location
ound plane. For pre-
linear Kalman Filter
robability of the ob-
model is formulated
2
3)
detection being trig-
lassifier response on
y applying the ORF
s evaluated with the
ues for each tracked
nging to the tracked
iven by
) (4)
its assignment to a
Mr) (5)
ith the highest com-
in that the combined
rom the total number
ssful association, the
is used for updating
les according to the
bject is updated with
cted state &;, other-
ty in prediction with
deviation c in eq. 3
g associations.
etection-to-track as-
ctory and every cur-
been associated to a
is used to initialise
e specific classifier,
tion as explained in
RF with the samples
d on the location of
ajectories that have
han a preset number
of frames to wait for
he classifier and the
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
100
a
B
=
E
©
oO
0
0)
S
o
0:4. 2.3 0 1 2 3 0 1 2 3
Target ID
Class. Conf. [
012 3 0 12 3 0.41 2 3
Target ID
Figure 3: Classification results after sequential training. The classification confidence is plotted in the diagrams for the blue (left), red
(middle) and yellow (right) framed tracking objects in the according colours below the image frames. The shown frames are captured
one instance of time after initialisation (left) and 40 frames later (right).
In Figure 3 we show a sequence from a test data set in the entrance
hall of our university. We began tracking when three people were
available in the scene and observed the classification results of the
ORF over time. The frames and the underlying statistics shown
in Figure 3 are captured right after initialisation (frame 2) and 40
frames later. The bar diagrams show the response of the classifier
for each of the tracked persons. It can be seen that the confidence
of the classification result rises from initially around 50 percent
to finally around 90 percent probability voted for the correct tar-
get. Right after initialisation, the confidence is lower, because
the classifier has not yet adapted well enough to the people's ap-
pearance. As expected, the classification becomes more distinct
when more samples of the people have been taken into account.
That lets the classifier adapt better to the current appearance of
the people.
The trajectories gathered by the data association strategy are ana-
lysed regarding the number of identity switches and re-initialisati-
ons of targets. We applied tracking in a test sequence of 1600 im-
ages captured in the entrance hall with a total of 23 people passing
the scene. Since we do not tackle the detection and localisation
task but only the association problem, metrics directly depending
on the detection performance are disregarded here. People passed
the test sequence with constant velocity but changed the direction
of walking and most people moved along the viewing direction of
the camera. The appearance of people hence changed while they
passed the scene due to the changing illumination and orientation
to the camera.
The result of using our approach is compared with reference data
obtained from manual labeling. For assessing the performance,
we count the identity switches as well as the number of times
à tracking object is initialised as a new instance although it was
already tracked. To demonstrate the benefit or our strategy we
performed tracking on the given sequence thrice: using the mo-
tion model only, using the classifier only and using the combined
scheme. The results are shown in Table 1. When using only
the motion model for association, 7 identity switches were en-
[. | ID-Switches | Re-initialisation — |
Motion 7 2
Classification 3 7
Combined 0 1
Table 1: Identity switch and re-initialisation counts.
countered, which occurred basically after mutual occlusions of
people. Using only classification for the association yielded a
count of 7 re-initialisations but lowered the number of identity
switches. Using the combined scheme yielded an appropriate
trade-off between the usage of the motion model and the classi-
fier. The number of identity switches could be reduced to 0 in the
tested sequence, the number of re-initialisations could be reduced
to 1. The trajectories gathered by our approach are visualised in
Figure 4. The one re-initialisation that happened during tracking
was due to too many missing detections in sequence, which let
the according person be dropped from tracking. Using the com-
bination of classifier and motion model, most people could be
tracked completely throughout the test sequence.
6 CONCLUSIONS
We have presented an approach for data association in a visual
people tracking framework, using Randomized Forests as classi-
fier together with a Kalman Filter. In order to establish correspon-
dences between detections and trajectories properly, the similar-
ity of detections and tracked objects can be statistically evaluated
by combining the response of the classifier with constraints de-
rived from the evaluation of the object's motion. We have demon-
strated the capability of the method to track people persistently
throughout a scene, even under changing viewing conditions and
mutual occlusions. The benefit of using the cues from motion
and the classifier jointly has been demonstrated in our experi-
ments. The confidence values that calculate for association can
be assigned to the final trajectories, which is helpful for further