In: Paparoditis N., Pierrot-Deseilligny M.. Mallet C.. Tournaire O. (Eds), IAPRS. Vol. XXXVIII. Part ЗА - Saint-Mandé, France. September 1-3. 2010
Image
Features
Motion
Parameters
Simple
Events
Scénario
100%
80%
60%
• Completeness Correctness ♦"
40%
Figure 4. Example for the scenario “waiting for another
person" consisting of four hierarchical layers.
3. EXPERIMENTAL RESULTS
3.1 Test scenario
For developing and testing the presented new approach, aerial
image sequences provided by DLR’s 3K multi-head camera
system are used (Kurz et al., 2007). This system consists of
three non-metric off-the-shelf cameras, with one camera
pointing in nadir direction and two in oblique direction. The
basis for near-realtime mapping is provided with a coupled
realtime GPS/IMU navigation system which enables accurate
direct georeferencing.
The aerial image sequence used in the experiments was
captured at a soccer match with a few thousand people heading
for the gates of the stadium. The height of flight was 1500m
resulting in a ground sampling distance of about 20cm. In spite
of the low resolution, people can be recognized clearly by their
long shadow. The camera system has been operating in
continuous mode which resulted in image sequences with a
length of 40 frames at a sampling rate of 2 Hz. Every image
covers an area of approximately 1000m x 600m and with an in
track overlap of about 90%. For the evaluation a smaller area
has been selected, completely visible in 16 consecutive frames.
Figure 6 shows the test area in every third frame of the image
sequence.
3 4 5 6 7 8 9 10 11 12
Number of jointly analysed, consecutive images
13 14 15
Figure 5. Comparison of manually tracked persons with the
results of the algorithm over a sequence of 15 aerial images
with about 130 persons visible.
which is not too crowded. Here, 130 persons could be marked
manually in average through a sequence of 15 frames. It is
important to know for a correct interpretation of the evaluation
that the reference data might not be free of errors. Occasionally,
manually tracked persons merged with others so that their
position had to be estimated for some frames. In other
situations, the contrast became too low to define the accurate
position of a person due to clouds passing by.
The evaluation results of the detection and tracking algorithms
are shown in Figure 5. An automatically generated segment is
considered as a correct detection if the distance between its
center and the next reference position is within a tolerance
radius of 3 pixels corresponding to 45cm on the ground. The
same criterion is applied to evaluate the tracking results.
Though, in this case every point of a generated trajectory has to
be close enough to one of the reference trajectories. For the
evaluation of the tracking results all possible links between two
up to 15 consecutive frames are compared. Figure 7 visualizes a
result of detection and tracking in comparison to the reference.
Averaging the results over all 15 images, the detection module
has achieved a completeness of 61% and a correctness of 66%
(cf. Figure 5, length 1). The completeness of the generated
trajectories increases almost linearly with growing length while
the completeness drops down quickly. Several reasons are
possible: one effect still to investigate is the influence of the
tolerance radius during evaluation. The center of the detected
segments could be more than 3 pixels away from the manually
marked position of the head of a person. This can happen when
the body of one person merges with its shadow to a uniform dot
due to low contrast, cf. Figure 7 (left). Another effect stems