XXII ISPRS Congress 2012: Technical Commission IV

ED 
nd 
id signs which 
levelopment of 
imagery-based 
future change 
of the FHNW 
automated 3D 
road signs with 
thermore, fully 
or intelligent 
Work, real-time 
mphasis is on 
V 
Peis E> ie 
  
  
  
  
  
utomatically be 
'ed algorithms 
The image-based road sign extraction process can typically be 
subdivided into two main steps. First, a detection of the road 
signs is carried out aiming at localizing potential candidates. 
Second, a classification is necessary to identify the type of road 
sign. If the absolute position of the detected road signs is of 
interest, mapping of the signs is performed in a third step. A 
comprehensive overview of different approaches for road sign 
detection and classification is given in Nguwi & Kouzani 
(2008); the most relevant are documented in the following 
chapters and at length in Cavegn & Nebiker (2012). 
1.1 Detection of road signs 
In many cases, road sign detection is based on color 
information. Color segmentation with thresholds allows fast 
focusing on search regions. As the RGB color space is sensitive 
to changes of lighting conditions due to shadows, illumination 
and view geometry as well as strong reflections, segmentation is 
usually carried out in the HSV color space based on the hue and 
saturation components (Fleyeh 2006, Maldonado-Bascón et al. 
2008). Madeira et al. (2005) use the hue and the chromatic 
RGB component for color segmentation. In comparison to the 
chromatic RGB component, the saturation component is very 
sensitive to noise in case of small values. 
1.2 Classification of road signs 
Road signs are frequently classified by means of neural 
networks (de la Escalera et al. 2003, Nguwi & Kouzani 2008). 
Since the algorithms have to be trained based on many images 
appearing in different scaling, orientation and illumination 
contexts, they are usually just implemented for a few types such 
as speed signs (Ren et al. 2009). Another method for the 
classification process is template matching. This intensity based 
image correlation approach is, for example, used by Piccioli et 
al. (1996) and Malik et al. (2007). In its basic form, it is not 
robust regarding scaling, rotation or affine transformations in 
general and is sensitive to illumination changes (Ren et al. 
2009). 
1.3 Further approaches for the detection and classification 
of road signs 
Many approaches are not designed to exclusively detect or 
classify road signs, but they are able to perform both tasks. A 
few of them are mentioned in the following. 
The Hough transform tolerates gaps and is not very sensitive to 
noise. However, due to different dimensions and shapes of road 
signs, many scales have to be considered which negatively 
influence the computation time and memory requirements. 
Therefore, real-time applications need faster modified methods. 
Chutatape & Guo (1999) proposed a modified version of the 
Hough transform which is utilized by Kim et al. (2006) for road 
sign detection following the extraction of edges from image data 
by means of the Canny operator. Barrile et al. (2007) detect 
shapes based on the standardized Hough transform. For the 
classification, they use the generalized Hough transform which 
is also utilized by Habib et al. (1999) on edges which were 
extracted with the Canny filter. 
The approaches of Support Vector Machines (SVM) and Scale 
Invariant Feature Transform (SIFT) are increasingly applied to 
both road sign detection and classification. If the SIFT approach 
by Lowe (2004) is used, the extracted features are invariant in 
terms of translation, rotation and scaling as well as insensitive 
to illumination changes, image noise and small geometric 
59 
deformations (Reiterer et al. 2009, Ren et al. 2009). 
Maldonado-Bascón et al. (2007) implemented two types of 
SVM which enable their algorithms to handle translations, 
rotations, scaling and mostly partial occlusions. 
2. EXPLOITATION OF DEPTH INFORMATION FROM 
STEREOVISION GEOMETRY 
For the designed and subsequently presented approach aiming 
at detection, classification and mapping of road signs, the 
exploitation of depth maps from stereovision imagery is the 
core element. Although depth information has an enormous 
potential, earlier and related work on vision-based road sign 
extraction was primarily focused on utilizing mono imagery. 
Only Cyganek (2008) incorporated depth data from stereo 
imagery as an optional contribution for search space reduction 
in the extraction process. Furthermore, previous investigations 
in general did not focus on establishing the 3D position of the 
extracted road signs. Exceptions are Madeira et al. (2005), Kim 
et al. (2006) and Baró et al. (2009) who determine the absolute 
3D object point coordinates based on stereo imagery as well as 
Shi et al. (2008) who use a combined approach of image and 
laserscanning data. While Shi et al. (2008) are able to achieve 
an accuracy of approximately 30 cm, Madeira et al. (2005) just 
obtain point coordinates with meter accuracy. However, precise 
determination of infrastructure objects in all three dimensions in 
a global geodetic reference system is crucial and has become 
increasingly important with respect to traffic planning, 
automated change detection, simulations and visual inspection 
in mixed reality environments. 
For efficient data capturing, a stereovision-based mobile 
mapping system (MMS) has to be employed (see Figure 2). The 
generation of depth maps is advantageously based on 
normalized images. Therefore, the distortion of the collected 
stereo images has to be corrected and the imagery subsequently 
transformed into the stereo normal case. Based on the resulting 
normalized images, the disparity for each pixel is determined by 
means of a stereo matching algorithm. The stereo geometry 
allows computing a depth value for each disparity and all values 
of an image constitute a depth map. For the investigations 
described in this paper, dense matching was performed with the 
semi-global block matching algorithm implemented in OpenCV 
(OpenCV 2012), which differs in a few points from the SGM 
algorithm by Hirschmiiller (2008) (e.g. computation of 
matching costs). 
For the subsequent automated detection and mapping of road 
signs, both normalized images and depth maps are required (see 
Figure 2). The classification process additionally needs 
templates of all possible road signs. After successful detection, 
classification and mapping, the regions of interest, the attribute 
data and the 3D position of the road signs are known. 
The developed object extraction algorithms exploit the stereo 
disparities and the derived depth maps, respectively, for the 
following tasks: 
e Search space reduction using a predefined distance range 
interval 
e Definition of distance-related criteria for the color 
segments 
e Generation of regions with similar depth values 
(planar segments) 
e Computation of 3D coordinates
1
2
...
70
71
72
73
74
...
544
545
Full text: Technical Commission IV (B4)

Access restriction

Copyright

Note to user