CMRT09: Object Extraction for 3D City Models, Road Databases and Traffic Monitoring - Concepts, Algorithms, and Evaluation
176
for example is explained in (Haag and Nagel, 1999), (Moon et
al., 2002), (Hinz, 2004) and (Ernst et al., 2005). In (Haag and
Nagel, 1999) a very extensive database of about 400 different
three-dimensional car models is used to predict the appearance
of vehicles including their shadow cast. In (Hinz, 2004) the au
thor uses not only the shadow but additionally the luminance and
reflectivity of the car’s surface as well which of course is more
expensive to process. Next to shape and shadow in (Zhao and
Nevatia, 2003) they try to recognize the windshield of vehicles.
The final decision is made by a Bayesian Network. Most of them
have a very reliable detection rate of more than 90 percent but
a long computing time. In the papers (Moon et al., 2002) and
(Ernst et al., 2005) they use rather simple two-dimensional mod
els for detection. While in (Ernst et al., 2005) the authors search
in the edge filtered image for rectangular objects of certain size
in (Moon et al., 2002) they already shape the edge filter to a rect
angle of expected car size. Both of them provide a fast and ac
ceptable detection rate using additional information about street
area and direction.
The use of implicit models is explained in (Grabner et al., 2008)
and (Lei et al., 2008). In (Grabner et al., 2008) the author sup
poses to use a learning AdaBoost algorithm which is robust and
fast by making a lot of cascaded weak decisions. In (Lei et al.,
2008) they train a support vector machine with the SIFT descrip
tors of selected cars and non-cars. But both of these approaches
have to be trained with lots of positive and negative samples be
fore working independently. Additionally it is not easy to cover
all cases of illumination and environment. That’s why many lean
ing algorithms have to be trained for every situation separately.
Another easy approach for detection of moving cars without us
ing any model is explained in (Reinartz et al., 2006) where they
detect all moving objects in adjacent images by computing the
normalized difference image. But as the georegistration of the
images often is less exact than the pixel size, the images have to
be coregistered first. On the other hand only moving objects can
be detected while traffic jams or queues in front of a traffic light
would be ignored.
Concerning tracking there are lots of publications using optical
flow and Kalman or particle filters to predict the expected dis
placement and appearance in following images. (Haag and Nagel,
1999) and (Nejadasl et al., 2006) pursued this approach which is
not easy to realize in the special case of only two or three adjacent
images. In (Lenhart and Hinz, 2006) they use especially triplets
of images to determine the best match between at least three states
which can be described as a kind of prediction. Another good
idea for the special case of very short bursts is presented in (Scott
and Longuet-Higgins, 1991) and improved in (Pilu, 1997). The
authors use singular value decomposition of a distance matrix to
match a group of features to another one with respect to the rela
tive positions of all features among each other. (Pilu, 1997) later
extends the approach by adding the correlation between pairs of
features.
2 APPROACH
2.1 Preprocessing
To identify the active regions as well as the orientation of images
among each other they have to be georeferenced, which means
their absolute geographic position and dimension have to be de
fined. Related to the GPS/IMU information and a digital terrain
model the image data gets projected into GeoTIFF images, which
are plane and oriented into north direction. This is useful to com
bine the recorded images with existing datasets like maps or street
data. To avoid examining the whole image data, only the street
area given by a database is considered.
2.2 Detection
For providing fast detection of traffic objects in the large images
a set of modified edge filters, that represent a two-dimensional
car model, is used. Recent tests showed that the car’s color in
formation does not yield better results in detection than its gray
value. Therefore the original images are converted into gray im
ages. This conversion saves two thirds of filtering time. As there
is additional information about street area and orientation this
knowledge is used as well. The databases provided by Navteq
(www.navteq.com) and Atkis (www.atkis.de) for example con
tain that information about the street network. For every street
segment covered by the image a bounding box around it is cut
out. The subimage is masked with the street segment to only use
the filters on traffic area. We use neither a Hough transforma
tion for finding straight edges nor a filter in shape of the whole
car, as mentioned in (Moon et al., 2002). But we create four spe
cial shaped edge filters to represent all edges of the car model,
which are elongated to the average expected size and turned into
the direction given by the street database (fig.2 and 3). To
Figure 3: The associated filter kernels
Figure 4: The shifted and thresholded filter answers 2 and 3
avoid filtering for all different car sizes, we only shift the filter
answers (fig.4) to the expected car edges within a certain range.
This has the same effect as positioning the filter kernels around
an anchor point. In the conjunction image of the four thresholded
and shifted edge images remain blobs at the position, where all
four filters have answered strong enough to the related edge filter.
The regions remaining (fig.5) become thinned by a non-maxima