XXII ISPRS Congress 2012: Technical Commission III

   
orresponding to 
; depending on 
r not. In any case, 
linearly into the 
y 8 bit. 
our vectors of the 
ature vectors fr 
irst three features 
c (NDVI), derived 
> CIR orthophoto, 
ning the image to 
nt), calculated as 
We also make use 
the variance of 
ieighbourhood of 
1s for var). The 
tion between an 
ure should model 
à certain distance 
generate an edge 
“the input image. 
edge image. The 
‘an image site to 
ce map. The last 
on, calculated in 
grad). In order to 
> two histograms 
ggs, 2005), one 
3 pixels in our 
hood (101 x 101 
tation bins. The 
responding to the 
ies in the two 
or each pixel is 
Model (DTM) is 
ological opening 
rresponds to the 
ene, followed by 
DSM feature is 
he DTM, ie, 
'elative elevation 
, or bridges. 
that is supposed 
'e use the output 
(Leitloff et. al. 
rage of detected 
iat method. The 
extended set of 
ut for pixel-wise 
depends on the 
sifier. Even for a 
patch size of 30 
t is not possible 
ation. Thus, the 
ificantly during 
"riedmann et. al. 
ced by Tieu & 
; method, which 
nerate a strong 
) leamer is a 
tained from the 
sum of confidence values of all weak learners. Generally stumps 
or classification trees are used as base classifier. Each node of a 
regression tree applies a threshold to only one feature. The 
thresholds and features are chosen so that the training error 
becomes minimal. Thus, the most distinguishing features are 
found during training. In our case only 350 features have been 
selected, which makes the final classification suitable for large 
datasets. More details about training the Boosting classifier can 
be found in our previous work (Leitloff et. al. 2010). The 
feature fear is defined as the combined confidence value of the 
classifier. 
35 Training and Inference 
Training of MRF is complex if it is to be carried out in a 
probabilistic framework, mainly due to the fact that it requires 
an estimate for the partition function Z in Eq. 1, which is 
computationally intractable. Thus, approximate solutions have 
to be used for training. In our application, we determine the 
parameters of the association and interaction potentials 
separately. That is, given the training data (fully labelled 
images), the probabilities p.(f;| x) are determined from 
histograms of the features /; (which are quantized by 8 bit for 
that purpose) of each class and smoothing, in the way described 
in Section 3.3. In a similar way, the interaction potentials are 
scaled versions of the 2D histograms of the co-occurrence of 
classes at neighbouring image sites in the way described in 
Section 3.4. Exact inference is also computationally intractable 
for MRF's. For inference, we use a message passing algorithm, 
namely Loopy Belief Propagation (LBP), a standard technique 
for probability propagation in graphs with cycles that has shown 
to give good results in the comparison reported in 
(Vishwanathan et al., 2006). 
4. EXPERIMENTS 
41 Test Data and Test Setup 
Under the auspices of the DGPF a test data set over Vaihingen 
(Germany) was acquired in order to evaluate digital aerial 
camera systems (Cramer, 2010). It consists of several blocks of 
vertical images captured by various digital aerial camera 
systems at two resolutions. We used one of the DMC blocks to 
test our approach. The images are 16 bit pan-sharpened colour 
infrared images with a ground sampling distance (GSD) of 8 cm 
(flying height: 800 m, focal length: 120 mm). For our 
experiments, the radiometric resolution of the images had to be 
converted to 8 bit. The georeferencing accuracy is about 1 pixel. 
The nominal forward and side laps of the images are 65% and 
60%, respectively. As a consequence, each crossroads in the 
block is visible in at least four images. 
For our experiments, we selected 55 crossroads by digitizing 
their approximate centres. The set of crossroads contained 
examples from densely built-up urban and suburban as well as 
rural areas. For each crossroads, we generated a DSM and a true 
orthophoto, both with a GSD of 8 cm in the way described in 
Section 3.2; the size of the orthophotos used in our process was 
1000 x 1000 pixels, thus corresponding to 80 x 80 m°. In the 
training phase we use the original orthophotos (1000 x 1000 
pixels); for inference, squares of 5 x 5 pixels were used as nodes 
of the graphical model; thus, each graphical model consisted of 
200 x 200 nodes. For the car confidence feature we used a 
classifier trained on data of DLR's 3K-system (Kurz et. al. 
2011). The sample images have a resolution of 20 cm. Thus, the 
   
Vaihingen dataset is resampled to this resolution for 
classification. Due to different radiometric properties, the Haar- 
like features are only calculated from intensity values. Both, 
resampling and exclusive use of intensity values limit the 
classification performance in this context. 
The ground truth was generated by manually labelling the image 
areas using altogether 14 classes (cf. Figure 2). We use the 
ground truth for the algorithm's training phase and for the 
evaluating the classification accuracy. In order to have a 
sufficient amount of training data, we had to use cross 
validation in our evaluation procedure: in each experiment, all 
images except one were used for training, and the remaining 
image served as a test image; this procedure was repeated 55 
times, each time using a different test image, so that in the end 
each image was used as a test image once. In all experiments, 
confusion matrices were determined from a comparison of the 
test images with the ground truth, as well as the completeness 
and the correctness of the results for each class and the overall 
classification accuracy (Rutzinger et al., 2009). 
We carried out four different experiments. In the first two 
experiments, we tried to separate all 14 classes; the only 
difference is the number of features we used. In the first 
experiment we used all features described in Section 3.5, 
including the car feature, whereas the second experiment was 
carried out without the car feature. In the third and the fourth 
experiments we reduced the set of classes to eight by merging 
classes having a similar appearance in the data. Again, the two 
experiments differ by the use of the car feature. 
4.2 Evaluation 
The confusion matrices as well as the completeness and the 
correctness of the results achieved in the first two experiments 
(the ones using 14 classes) are shown in Tables 1 and 2; an 
example for the classification result is shown in Fig. 2. The 
overall accuracy of the classification was 63.5% if the car 
feature was used and 63.3% if it was not used. Thus, the overall 
accuracy, while being relatively poor in both cases, was hardly 
improved by that feature. The relatively poor overall accuracy is 
caused by the fact that some of the classes have a very similar 
appearance in the data, e.g. sealed, road, sidewalk, and traffic 
islands. Reasonable values of completeness and correctness 
could be achieved for buildings (> 80%). For trees, the 
completeness is also larger than 80%, but the correctness is 
much lower (62%). Both, for buildings and trees the main error 
source was errors in the DSM caused by areas with hardly any 
texture (buildings) or abrupt height changes (trees). One of the 
problems was the information reduction caused by the 
conversion of the images to 8 bit, but apparently the openCV 
matcher also had problems with non-fronto-parallel surfaces 
and with different illumination. The main impact of the car 
confidence feature was a considerable reduction of the false 
positive car detections, though the correctness of 28% achieved 
with this feature is still not satisfactory. 
The evaluation of the experiments carried out with the reduced 
set of classes is presented in Tables 3 and 4. The overall 
accuracy increased to about 75%, which indicates that our 
classification scheme is reasonable, though there is room for 
improvement. The main error source is the confusion between 
trees, grass, and agriculture, again partly caused by DSM 
errors. In this setting, the impact of the car confidence feature is 
similar to its impact in the first group of experiments.
1
2
...
478
479
480
481
482
...
586
587
Full text: Technical Commission III (B3)

Access restriction

Copyright

Note to user