3.2 The developed workflow
The urban scene of the pilot area contains several roads and
streets. There are also cars and shadows of the trees or
buildings, which disturb the exact recognition of the roads. An
idea to overcome on this difficulty was to implement a two
phase workflow, where the first phase — the segmentation —
extracts pixel candidates belonging possibly to road category,
then a sophisticated linkage (detection) can compile the final
roads. The resulting binary image of the first step unfortunately
contains wrongly road classified pixels; in some cases only
scattered points build these noisy pixels. To remove these pixels
a median filter has been applied.
The linking phase itself has two subphases: the first one is an
automatic, while the second requires human interaction.
Because the automatic compilation step has the genetic
algorithm in the focus, that method strongly depends on the
random initial genes, several runs were conducted; each has
handled a couple of possible candidates. The human linking
step evaluates the best genes, keeps only the suitable ones and
forms the network.
3.3 Segmentation of the imagery
The first segmentation method was the support vector machine,
which needs suitable training areas. Four road training sites
were marked; the total area was 5726 pixel, means 36.6 m),
which is 0.896 of the covered image. The training data set was
extended with non-road pixels of the same amount.
The SVM-classification starts with training, where the network
parameters are to be determined. After several experiments with
linear and RBF kernels, it came out that the size of the data set
is too big, so a resampling had to be executed. The kept data set
had 1769 road and non-road pixels, where the ratio was 64.8%-
35.2%.
As only the image intensity information was to be used for the
classification, a scatterplot analysis was performed. Because of
the strong overlapping, two ways were open:
e extend the information by additional sources,
e decorrelate the groups by mathematical techniques
(e.g. principal component analysis).
The additional information source intended to be kept in
relation with the image, i.e. the use of elevation information
was rejected. Image base additional sources can then be for
example the vegetation indices. The normalized differential
vegetation index (NDVI) is also defined for aerial (and ortho)
images; a small modification increased our accuracy:
NDVI= AR (4)
R+G+B
The calculated NDVI was added as the fourth dimension.
The decorrelation by the principal component analysis (PCA)
and transformation is also a frequently used preprocessing step
before neural classification. The result of the repeated scatter
analysis can be seen in Fig. 2.
The RBF kernel function can be controlled by its sigma scaling
factor, whose value was at first strongly increased to produce
any result, then was successively decreased to get better
classification accuracy.
The classification accuracy was measured in this context as in-
sample accuracy, meaning the trained network was used to
classify only the training data set. The overall accuracy (OA)
was sufficient to evaluate which setting leads to the best
performance.
-200 |
-300 Fr
400 + >.
.
-500 -
i L L 1 L L j
-2000 -1500 -1000 -500 0 500 1000 1500
PC2
-690
Figure 2. Decorrelated inputs for SVM training with road (red
dots) and non-road (blue dots) samples
The SOM segmentation needs no training data, but the
definition of the layered neurons. After initial tests a 9x9
neuron sized hexagonal mesh was accepted with Euclidean
distance measure. The training was set with 200 epochs, having
the whole image as inputs.
To be able to compare both described classification method, a
third type was also done: a hyperbox (parallelepiped)
classification, known from the statistical pattern recognition.
This supervised method was fed by parameters derived from
the already mentioned training sites. The box-classifier
parameters were the intensity minimum and maximum values
in each image bands.
All these presented segmentation techniques resulted a binary
thematic map with road and non-road pixels. The binary
images were given to the genetic algorithms in the next
processing phase.
3.4 Detecting road segments
The genes as road segments are defined by rectangles, where the
key points are the two midpoints (P, and P;) of the shorter
edges (Fig. 3). The length of the rectangle is defined by these
keypoints, where the half width is controlled by a parameter
(w). The corners (A, B, C, D) of the rectangle can be computed
by geometric rules.
D
[Ue r un P;
À ui ”
o ur C
Pi —
w
yr
D
Figure 3. Definition of the basic road segment by rectangle
The population is built up of these rectangles. During the
initialization 50-100 rectangles were generated with random
coordinates for points P; and P,, where the width parameter w
was fixed. The rectangles are masks laid on the binary
segmentation image. The fitness function can be defined for this
binary subimage, as follows
e counting the covered road pixels,
e based on the number of the covered road pixels
divided by the length/area of the rectangle,
lee]
Ó-
aee CY 2,n "mh .—. .J]j- f) S £j tQ "OQ. Aaw-GÀ DD