Full text: CMRT09

CMRT09: Object Extraction for 3D City Models, Road Databases and Traffic Monitoring - Concepts, Algorithms, and Evaluation 
176 
for example is explained in (Haag and Nagel, 1999), (Moon et 
al., 2002), (Hinz, 2004) and (Ernst et al., 2005). In (Haag and 
Nagel, 1999) a very extensive database of about 400 different 
three-dimensional car models is used to predict the appearance 
of vehicles including their shadow cast. In (Hinz, 2004) the au 
thor uses not only the shadow but additionally the luminance and 
reflectivity of the car’s surface as well which of course is more 
expensive to process. Next to shape and shadow in (Zhao and 
Nevatia, 2003) they try to recognize the windshield of vehicles. 
The final decision is made by a Bayesian Network. Most of them 
have a very reliable detection rate of more than 90 percent but 
a long computing time. In the papers (Moon et al., 2002) and 
(Ernst et al., 2005) they use rather simple two-dimensional mod 
els for detection. While in (Ernst et al., 2005) the authors search 
in the edge filtered image for rectangular objects of certain size 
in (Moon et al., 2002) they already shape the edge filter to a rect 
angle of expected car size. Both of them provide a fast and ac 
ceptable detection rate using additional information about street 
area and direction. 
The use of implicit models is explained in (Grabner et al., 2008) 
and (Lei et al., 2008). In (Grabner et al., 2008) the author sup 
poses to use a learning AdaBoost algorithm which is robust and 
fast by making a lot of cascaded weak decisions. In (Lei et al., 
2008) they train a support vector machine with the SIFT descrip 
tors of selected cars and non-cars. But both of these approaches 
have to be trained with lots of positive and negative samples be 
fore working independently. Additionally it is not easy to cover 
all cases of illumination and environment. That’s why many lean 
ing algorithms have to be trained for every situation separately. 
Another easy approach for detection of moving cars without us 
ing any model is explained in (Reinartz et al., 2006) where they 
detect all moving objects in adjacent images by computing the 
normalized difference image. But as the georegistration of the 
images often is less exact than the pixel size, the images have to 
be coregistered first. On the other hand only moving objects can 
be detected while traffic jams or queues in front of a traffic light 
would be ignored. 
Concerning tracking there are lots of publications using optical 
flow and Kalman or particle filters to predict the expected dis 
placement and appearance in following images. (Haag and Nagel, 
1999) and (Nejadasl et al., 2006) pursued this approach which is 
not easy to realize in the special case of only two or three adjacent 
images. In (Lenhart and Hinz, 2006) they use especially triplets 
of images to determine the best match between at least three states 
which can be described as a kind of prediction. Another good 
idea for the special case of very short bursts is presented in (Scott 
and Longuet-Higgins, 1991) and improved in (Pilu, 1997). The 
authors use singular value decomposition of a distance matrix to 
match a group of features to another one with respect to the rela 
tive positions of all features among each other. (Pilu, 1997) later 
extends the approach by adding the correlation between pairs of 
features. 
2 APPROACH 
2.1 Preprocessing 
To identify the active regions as well as the orientation of images 
among each other they have to be georeferenced, which means 
their absolute geographic position and dimension have to be de 
fined. Related to the GPS/IMU information and a digital terrain 
model the image data gets projected into GeoTIFF images, which 
are plane and oriented into north direction. This is useful to com 
bine the recorded images with existing datasets like maps or street 
data. To avoid examining the whole image data, only the street 
area given by a database is considered. 
2.2 Detection 
For providing fast detection of traffic objects in the large images 
a set of modified edge filters, that represent a two-dimensional 
car model, is used. Recent tests showed that the car’s color in 
formation does not yield better results in detection than its gray 
value. Therefore the original images are converted into gray im 
ages. This conversion saves two thirds of filtering time. As there 
is additional information about street area and orientation this 
knowledge is used as well. The databases provided by Navteq 
(www.navteq.com) and Atkis (www.atkis.de) for example con 
tain that information about the street network. For every street 
segment covered by the image a bounding box around it is cut 
out. The subimage is masked with the street segment to only use 
the filters on traffic area. We use neither a Hough transforma 
tion for finding straight edges nor a filter in shape of the whole 
car, as mentioned in (Moon et al., 2002). But we create four spe 
cial shaped edge filters to represent all edges of the car model, 
which are elongated to the average expected size and turned into 
the direction given by the street database (fig.2 and 3). To 
Figure 3: The associated filter kernels 
Figure 4: The shifted and thresholded filter answers 2 and 3 
avoid filtering for all different car sizes, we only shift the filter 
answers (fig.4) to the expected car edges within a certain range. 
This has the same effect as positioning the filter kernels around 
an anchor point. In the conjunction image of the four thresholded 
and shifted edge images remain blobs at the position, where all 
four filters have answered strong enough to the related edge filter. 
The regions remaining (fig.5) become thinned by a non-maxima
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.