Full text: ISPRS Workshop on 3D Virtual City Modeling (VCM 2013)

for a sample with class k is also a vector that has +1 in its kth 
component and —1 otherwise. Weak classifiers are of the form 
h(x) =1-n(x), (3) 
where 1 is also a vector with X components with +1 or —1 in 
each component and 7 is a binary classifier. The components of 
1 signal if the binary classifier n has a positive or negative cor- 
relation with the respective class label. In the multi-class case, 
each instance has one weight per classifier component. The weak 
classifier is selected by weighted averaging over the components 
and the weights are updated analogous to binary AdaBoost. 
3 EXPERIMENTS 
We compare classification results for the different feature sets for 
four classes: buildings, high vegetation (trees, bushes etc.), low 
vegetation (grassland, small bushes etc.), and streets. All ground 
truth was annotated manually. The ratio of training (25%) and 
testing pixels (75%) is kept constant across all four datasets. For 
training sample selection we simply take a strip on one side of 
the image, having in mind that each class should be represented 
with a reasonable amount of pixels. 
We use 100 weak learners for training the MultiBoost classi- 
fier. The feature selection capability of boosting algorithms al- 
lows one to extract only the selected features during testing. This 
greatly reduces the computation time for testing the classifier. 
3.1 Datasets 
We evaluate the proposed method on four different VHR datasets 
(Fig. 3), three aerial photos and one satellite image. 
Image KLOTEN (Switzerland) was acquired with an analogue 
aerial camera Wild RC30 and scanned. It depicts a part of Kloten 
airport in the vicinity of Zurich, Switzerland. The image has three 
spectral bands: red, green, and near infrared. For evaluation we 
only take a small subset of the scene of 1266x789 pixels at 8 
cm GSD. Only a single image is available, thus neither DTM nor 
DSM can be computed. 
Test image GRAZ (Austria) is a subset of a RGB aerial image 
of a large block acquired with a Microsoft Vexcel Ultracam D. 
Its size is 800x 800 pixels at a GSD of 25 cm. A digital surface 
model (DSM) was computed via dense matching. Instead of gen- 
erating a true orthophoto from the aerial image, the DSM was 
transformed to the geometry of the aerial image because man- 
ually labeled ground truth had been acquired in this geometry. 
Finally, a normalized DSM (nDSM) was computed via standard 
filtering techniques. Since only RGB channels exist for GRAZ 
a pseudo-NDVI was computed where the green channel replaces 
the near infrared channel. 
VAIHINGEN (Germany) is a 1000x 1000 pixels subset of a true 
orthophoto mosaic generated from an Intergraph DMC block with 
8 cm GSD with red, green, and near infrared channels taken from 
publicly available benchmark data for urban object classification 
and 3D building reconstruction (Cramer, 2010; Rottensteiner et 
al., 2012). A nDSM was obtained by dense matching and subse- 
quent filtering. 
The satellite test image is a 1000x 1000 pixels part of a stereo- 
scene of WORLDVIEW-2 acquired over Zurich (Switzerland). A 
pan-sharpened image of 50 cm GSD with three channels red, 
green, and near infrared was generated. The stereo configura- 
tion of the imagery allowed extraction of the DSM from the pan- 
sharpened channels and the DSM was upsampled to the resolu- 
tion of the image. It should nonetheless be noted that the DSM 
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-3/W1, 2013 
VCM 2013 - The ISPRS Workshop on 3D Virtual City Modeling, 28 May 2013, Regina, Canada 
quality is much lower than for aerial images (GRAZ and VAIHIN- 
GEN) because of the lower resolution. 
3.2 Results and discussion 
We present direct pixel-wise results of the boosting classifier 
based on the different feature sets (Fig. 4, Tab. 1) without any 
prior segmentation into superpixels, posterior smoothing via graph 
cuts or morphological cleaning, to compare only the effect of the 
features, without potential biases due to pre- or post-processing. 
Classification results have been evaluated using two measures: 
the overall classification accuracy and the kappa index. By mea- 
suring the improvement over a chance agreement, as opposed to 
the one over a 100% wrong result that is measured by the over- 
all accuracy, k compensates frequency biases’. Table 1 summa- 
rizes results for all hand-crafted features and the proposed quasi- 
exhaustive features. 
In order to quantify how much improvement is due to the nDSM, 
we compute two separate runs. The first considers all channels 
except the nDSM for evaluation of all four datasets. Secondly, 
evaluation is repeated with all channels plus nDSM for the three 
datasets VAIHINGEN, GRAZ, and WORLDVIEW-2. Recall that 
no height information was available for KLOTEN. In general, 
datasets augmented with relative height information achieve clas- 
sification accuracies up to 10 percent points better (Tab. 1). 
The proposed quasi-exhaustive features outperform almost all base- 
lines in all tests. However, results are close to those of the "Aug- 
mented 15x 15 pixels neighborhood” and in the case of the "GRAZ 
without nDSM" are worse. A closer inspection of this particular 
result reveals that it is due to over-fitting causing confusion of 
street and roofs with the same color as can be seen in the center 
of the images displayed in the second row of Fig. 4. 
Regarding the WORLDVIEW-2 dataset, our method performs on 
the same level as the augmented features which is most proba- 
bly due to less distinctive textural patterns in the pan-sharpened 
image, as well as the poor quality of the DSM. 
We plot the classification accuracy versus the number of boosting 
training iterations in Fig. 2 for test scene VAIHINGEN (without 
nDSM). The red curve of the "Augmented 15x 15 pixels neigh- 
borhood” shows the steepest accuracy increase for the first five re- 
spective ten iterations because it immediately captures the NDVI 
and, less dominantly, the PCA. For example, for this particular 
run shown as red curve in Fig. 1 NDVI features ranked Ist, 2nd, 
7th, and 9th while PCA features ranked 6th and 10th. Quasi- 
exhaustive features show a less rapid increase, but outperform all 
baselines after the 20th training iteration. 
4 CONCLUSIONS AND OUTLOOK 
We have investigated the need for feature engineering when clas- 
sifying VHR remote sensing images from different 
sources and showing different scenes. We have demonstrated the 
power of a simple strategy: rather than trying to determine/guess 
the best feature set for a given classification problem, supply a 
quasi-exhaustive feature set capturing image intensity and tex- 
ture at multiple scales over all channels, and let the classifier pick 
N eum cs) Cii 
2Formally, & = 2. SO j S ? x where the cj; 
Zu 5» cji) 
are the entries of the confusion matrix and N is the number of pixels. 
Consider an image with 10% pixels of class A and 90% pixels of class 
B. A classifier which always returns B will have 90% overall accuracy, 
but & = 0%. 
  
      
   
  
  
   
  
   
   
  
   
  
   
  
  
   
  
  
   
  
   
  
  
   
   
  
  
  
   
  
   
   
   
   
   
    
   
    
    
   
    
    
   
   
   
    
   
   
   
   
   
  
   
   
   
   
   
   
    
  
  
   
   
  
  
   
    
  
a suitable s 
efficiently c 
set we prop 
evaluate blc 
way approx 
In future wc 
our huge cz 
different da 
rently pre-d 
to extractin; 
ing to test v 
randomly c 
object detec 
Bay, H., Es: 
up robust 
Benbouzid, 
and Kgl, 
age. Jour 
Bovolo, E, 
nique for 
tor machi 
pp. 2983 
Briem, G., 
ple Class 
IEEE TG 
Cramer, M., 
uation ov 
erkundun 
Dollar, P, 1 
channel fi
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.