XXII ISPRS Congress 2012: Technical Commission III

3.2 The developed workflow 
The urban scene of the pilot area contains several roads and 
streets. There are also cars and shadows of the trees or 
buildings, which disturb the exact recognition of the roads. An 
idea to overcome on this difficulty was to implement a two 
phase workflow, where the first phase — the segmentation — 
extracts pixel candidates belonging possibly to road category, 
then a sophisticated linkage (detection) can compile the final 
roads. The resulting binary image of the first step unfortunately 
contains wrongly road classified pixels; in some cases only 
scattered points build these noisy pixels. To remove these pixels 
a median filter has been applied. 
The linking phase itself has two subphases: the first one is an 
automatic, while the second requires human interaction. 
Because the automatic compilation step has the genetic 
algorithm in the focus, that method strongly depends on the 
random initial genes, several runs were conducted; each has 
handled a couple of possible candidates. The human linking 
step evaluates the best genes, keeps only the suitable ones and 
forms the network. 
3.3 Segmentation of the imagery 
The first segmentation method was the support vector machine, 
which needs suitable training areas. Four road training sites 
were marked; the total area was 5726 pixel, means 36.6 m), 
which is 0.896 of the covered image. The training data set was 
extended with non-road pixels of the same amount. 
The SVM-classification starts with training, where the network 
parameters are to be determined. After several experiments with 
linear and RBF kernels, it came out that the size of the data set 
is too big, so a resampling had to be executed. The kept data set 
had 1769 road and non-road pixels, where the ratio was 64.8%- 
35.2%. 
As only the image intensity information was to be used for the 
classification, a scatterplot analysis was performed. Because of 
the strong overlapping, two ways were open: 
e extend the information by additional sources, 
e  decorrelate the groups by mathematical techniques 
(e.g. principal component analysis). 
The additional information source intended to be kept in 
relation with the image, i.e. the use of elevation information 
was rejected. Image base additional sources can then be for 
example the vegetation indices. The normalized differential 
vegetation index (NDVI) is also defined for aerial (and ortho) 
images; a small modification increased our accuracy: 
NDVI= AR (4) 
R+G+B 
The calculated NDVI was added as the fourth dimension. 
The decorrelation by the principal component analysis (PCA) 
and transformation is also a frequently used preprocessing step 
before neural classification. The result of the repeated scatter 
analysis can be seen in Fig. 2. 
The RBF kernel function can be controlled by its sigma scaling 
factor, whose value was at first strongly increased to produce 
any result, then was successively decreased to get better 
classification accuracy. 
The classification accuracy was measured in this context as in- 
sample accuracy, meaning the trained network was used to 
classify only the training data set. The overall accuracy (OA) 
was sufficient to evaluate which setting leads to the best 
performance. 
  
-200 | 
-300 Fr 
400 + >. 
. 
-500 - 
  
i L L 1 L L j 
-2000 -1500 -1000 -500 0 500 1000 1500 
PC2 
-690 
Figure 2. Decorrelated inputs for SVM training with road (red 
dots) and non-road (blue dots) samples 
The SOM segmentation needs no training data, but the 
definition of the layered neurons. After initial tests a 9x9 
neuron sized hexagonal mesh was accepted with Euclidean 
distance measure. The training was set with 200 epochs, having 
the whole image as inputs. 
To be able to compare both described classification method, a 
third type was also done: a hyperbox (parallelepiped) 
classification, known from the statistical pattern recognition. 
This supervised method was fed by parameters derived from 
the already mentioned training sites. The box-classifier 
parameters were the intensity minimum and maximum values 
in each image bands. 
All these presented segmentation techniques resulted a binary 
thematic map with road and non-road pixels. The binary 
images were given to the genetic algorithms in the next 
processing phase. 
3.4 Detecting road segments 
The genes as road segments are defined by rectangles, where the 
key points are the two midpoints (P, and P;) of the shorter 
edges (Fig. 3). The length of the rectangle is defined by these 
keypoints, where the half width is controlled by a parameter 
(w). The corners (A, B, C, D) of the rectangle can be computed 
by geometric rules. 
D 
[Ue r un P; 
À ui ” 
o ur C 
Pi — 
w 
yr 
D 
Figure 3. Definition of the basic road segment by rectangle 
The population is built up of these rectangles. During the 
initialization 50-100 rectangles were generated with random 
coordinates for points P; and P,, where the width parameter w 
was fixed. The rectangles are masks laid on the binary 
segmentation image. The fitness function can be defined for this 
binary subimage, as follows 
e counting the covered road pixels, 
e based on the number of the covered road pixels 
divided by the length/area of the rectangle, 
    
   
    
  
  
  
  
  
  
   
  
  
   
   
   
  
   
  
   
   
  
   
   
  
   
   
  
   
  
   
   
  
   
   
   
    
   
   
   
  
  
  
  
   
  
  
    
  
lee] 
Ó- 
aee CY 2,n "mh .—. .J]j- f) S £j tQ "OQ. Aaw-GÀ DD
1
2
...
253
254
255
256
257
...
586
587
Full text: Technical Commission III (B3)

Access restriction

Copyright

Note to user