ED
nd
id signs which
levelopment of
imagery-based
future change
of the FHNW
automated 3D
road signs with
thermore, fully
or intelligent
Work, real-time
mphasis is on
V
Peis E> ie
utomatically be
'ed algorithms
The image-based road sign extraction process can typically be
subdivided into two main steps. First, a detection of the road
signs is carried out aiming at localizing potential candidates.
Second, a classification is necessary to identify the type of road
sign. If the absolute position of the detected road signs is of
interest, mapping of the signs is performed in a third step. A
comprehensive overview of different approaches for road sign
detection and classification is given in Nguwi & Kouzani
(2008); the most relevant are documented in the following
chapters and at length in Cavegn & Nebiker (2012).
1.1 Detection of road signs
In many cases, road sign detection is based on color
information. Color segmentation with thresholds allows fast
focusing on search regions. As the RGB color space is sensitive
to changes of lighting conditions due to shadows, illumination
and view geometry as well as strong reflections, segmentation is
usually carried out in the HSV color space based on the hue and
saturation components (Fleyeh 2006, Maldonado-Bascón et al.
2008). Madeira et al. (2005) use the hue and the chromatic
RGB component for color segmentation. In comparison to the
chromatic RGB component, the saturation component is very
sensitive to noise in case of small values.
1.2 Classification of road signs
Road signs are frequently classified by means of neural
networks (de la Escalera et al. 2003, Nguwi & Kouzani 2008).
Since the algorithms have to be trained based on many images
appearing in different scaling, orientation and illumination
contexts, they are usually just implemented for a few types such
as speed signs (Ren et al. 2009). Another method for the
classification process is template matching. This intensity based
image correlation approach is, for example, used by Piccioli et
al. (1996) and Malik et al. (2007). In its basic form, it is not
robust regarding scaling, rotation or affine transformations in
general and is sensitive to illumination changes (Ren et al.
2009).
1.3 Further approaches for the detection and classification
of road signs
Many approaches are not designed to exclusively detect or
classify road signs, but they are able to perform both tasks. A
few of them are mentioned in the following.
The Hough transform tolerates gaps and is not very sensitive to
noise. However, due to different dimensions and shapes of road
signs, many scales have to be considered which negatively
influence the computation time and memory requirements.
Therefore, real-time applications need faster modified methods.
Chutatape & Guo (1999) proposed a modified version of the
Hough transform which is utilized by Kim et al. (2006) for road
sign detection following the extraction of edges from image data
by means of the Canny operator. Barrile et al. (2007) detect
shapes based on the standardized Hough transform. For the
classification, they use the generalized Hough transform which
is also utilized by Habib et al. (1999) on edges which were
extracted with the Canny filter.
The approaches of Support Vector Machines (SVM) and Scale
Invariant Feature Transform (SIFT) are increasingly applied to
both road sign detection and classification. If the SIFT approach
by Lowe (2004) is used, the extracted features are invariant in
terms of translation, rotation and scaling as well as insensitive
to illumination changes, image noise and small geometric
59
deformations (Reiterer et al. 2009, Ren et al. 2009).
Maldonado-Bascón et al. (2007) implemented two types of
SVM which enable their algorithms to handle translations,
rotations, scaling and mostly partial occlusions.
2. EXPLOITATION OF DEPTH INFORMATION FROM
STEREOVISION GEOMETRY
For the designed and subsequently presented approach aiming
at detection, classification and mapping of road signs, the
exploitation of depth maps from stereovision imagery is the
core element. Although depth information has an enormous
potential, earlier and related work on vision-based road sign
extraction was primarily focused on utilizing mono imagery.
Only Cyganek (2008) incorporated depth data from stereo
imagery as an optional contribution for search space reduction
in the extraction process. Furthermore, previous investigations
in general did not focus on establishing the 3D position of the
extracted road signs. Exceptions are Madeira et al. (2005), Kim
et al. (2006) and Baró et al. (2009) who determine the absolute
3D object point coordinates based on stereo imagery as well as
Shi et al. (2008) who use a combined approach of image and
laserscanning data. While Shi et al. (2008) are able to achieve
an accuracy of approximately 30 cm, Madeira et al. (2005) just
obtain point coordinates with meter accuracy. However, precise
determination of infrastructure objects in all three dimensions in
a global geodetic reference system is crucial and has become
increasingly important with respect to traffic planning,
automated change detection, simulations and visual inspection
in mixed reality environments.
For efficient data capturing, a stereovision-based mobile
mapping system (MMS) has to be employed (see Figure 2). The
generation of depth maps is advantageously based on
normalized images. Therefore, the distortion of the collected
stereo images has to be corrected and the imagery subsequently
transformed into the stereo normal case. Based on the resulting
normalized images, the disparity for each pixel is determined by
means of a stereo matching algorithm. The stereo geometry
allows computing a depth value for each disparity and all values
of an image constitute a depth map. For the investigations
described in this paper, dense matching was performed with the
semi-global block matching algorithm implemented in OpenCV
(OpenCV 2012), which differs in a few points from the SGM
algorithm by Hirschmiiller (2008) (e.g. computation of
matching costs).
For the subsequent automated detection and mapping of road
signs, both normalized images and depth maps are required (see
Figure 2). The classification process additionally needs
templates of all possible road signs. After successful detection,
classification and mapping, the regions of interest, the attribute
data and the 3D position of the road signs are known.
The developed object extraction algorithms exploit the stereo
disparities and the derived depth maps, respectively, for the
following tasks:
e Search space reduction using a predefined distance range
interval
e Definition of distance-related criteria for the color
segments
e Generation of regions with similar depth values
(planar segments)
e Computation of 3D coordinates