for a sample with class k is also a vector that has +1 in its kth
component and —1 otherwise. Weak classifiers are of the form
h(x) =1-n(x), (3)
where 1 is also a vector with X components with +1 or —1 in
each component and 7 is a binary classifier. The components of
1 signal if the binary classifier n has a positive or negative cor-
relation with the respective class label. In the multi-class case,
each instance has one weight per classifier component. The weak
classifier is selected by weighted averaging over the components
and the weights are updated analogous to binary AdaBoost.
3 EXPERIMENTS
We compare classification results for the different feature sets for
four classes: buildings, high vegetation (trees, bushes etc.), low
vegetation (grassland, small bushes etc.), and streets. All ground
truth was annotated manually. The ratio of training (25%) and
testing pixels (75%) is kept constant across all four datasets. For
training sample selection we simply take a strip on one side of
the image, having in mind that each class should be represented
with a reasonable amount of pixels.
We use 100 weak learners for training the MultiBoost classi-
fier. The feature selection capability of boosting algorithms al-
lows one to extract only the selected features during testing. This
greatly reduces the computation time for testing the classifier.
3.1 Datasets
We evaluate the proposed method on four different VHR datasets
(Fig. 3), three aerial photos and one satellite image.
Image KLOTEN (Switzerland) was acquired with an analogue
aerial camera Wild RC30 and scanned. It depicts a part of Kloten
airport in the vicinity of Zurich, Switzerland. The image has three
spectral bands: red, green, and near infrared. For evaluation we
only take a small subset of the scene of 1266x789 pixels at 8
cm GSD. Only a single image is available, thus neither DTM nor
DSM can be computed.
Test image GRAZ (Austria) is a subset of a RGB aerial image
of a large block acquired with a Microsoft Vexcel Ultracam D.
Its size is 800x 800 pixels at a GSD of 25 cm. A digital surface
model (DSM) was computed via dense matching. Instead of gen-
erating a true orthophoto from the aerial image, the DSM was
transformed to the geometry of the aerial image because man-
ually labeled ground truth had been acquired in this geometry.
Finally, a normalized DSM (nDSM) was computed via standard
filtering techniques. Since only RGB channels exist for GRAZ
a pseudo-NDVI was computed where the green channel replaces
the near infrared channel.
VAIHINGEN (Germany) is a 1000x 1000 pixels subset of a true
orthophoto mosaic generated from an Intergraph DMC block with
8 cm GSD with red, green, and near infrared channels taken from
publicly available benchmark data for urban object classification
and 3D building reconstruction (Cramer, 2010; Rottensteiner et
al., 2012). A nDSM was obtained by dense matching and subse-
quent filtering.
The satellite test image is a 1000x 1000 pixels part of a stereo-
scene of WORLDVIEW-2 acquired over Zurich (Switzerland). A
pan-sharpened image of 50 cm GSD with three channels red,
green, and near infrared was generated. The stereo configura-
tion of the imagery allowed extraction of the DSM from the pan-
sharpened channels and the DSM was upsampled to the resolu-
tion of the image. It should nonetheless be noted that the DSM
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-3/W1, 2013
VCM 2013 - The ISPRS Workshop on 3D Virtual City Modeling, 28 May 2013, Regina, Canada
quality is much lower than for aerial images (GRAZ and VAIHIN-
GEN) because of the lower resolution.
3.2 Results and discussion
We present direct pixel-wise results of the boosting classifier
based on the different feature sets (Fig. 4, Tab. 1) without any
prior segmentation into superpixels, posterior smoothing via graph
cuts or morphological cleaning, to compare only the effect of the
features, without potential biases due to pre- or post-processing.
Classification results have been evaluated using two measures:
the overall classification accuracy and the kappa index. By mea-
suring the improvement over a chance agreement, as opposed to
the one over a 100% wrong result that is measured by the over-
all accuracy, k compensates frequency biases’. Table 1 summa-
rizes results for all hand-crafted features and the proposed quasi-
exhaustive features.
In order to quantify how much improvement is due to the nDSM,
we compute two separate runs. The first considers all channels
except the nDSM for evaluation of all four datasets. Secondly,
evaluation is repeated with all channels plus nDSM for the three
datasets VAIHINGEN, GRAZ, and WORLDVIEW-2. Recall that
no height information was available for KLOTEN. In general,
datasets augmented with relative height information achieve clas-
sification accuracies up to 10 percent points better (Tab. 1).
The proposed quasi-exhaustive features outperform almost all base-
lines in all tests. However, results are close to those of the "Aug-
mented 15x 15 pixels neighborhood” and in the case of the "GRAZ
without nDSM" are worse. A closer inspection of this particular
result reveals that it is due to over-fitting causing confusion of
street and roofs with the same color as can be seen in the center
of the images displayed in the second row of Fig. 4.
Regarding the WORLDVIEW-2 dataset, our method performs on
the same level as the augmented features which is most proba-
bly due to less distinctive textural patterns in the pan-sharpened
image, as well as the poor quality of the DSM.
We plot the classification accuracy versus the number of boosting
training iterations in Fig. 2 for test scene VAIHINGEN (without
nDSM). The red curve of the "Augmented 15x 15 pixels neigh-
borhood” shows the steepest accuracy increase for the first five re-
spective ten iterations because it immediately captures the NDVI
and, less dominantly, the PCA. For example, for this particular
run shown as red curve in Fig. 1 NDVI features ranked Ist, 2nd,
7th, and 9th while PCA features ranked 6th and 10th. Quasi-
exhaustive features show a less rapid increase, but outperform all
baselines after the 20th training iteration.
4 CONCLUSIONS AND OUTLOOK
We have investigated the need for feature engineering when clas-
sifying VHR remote sensing images from different
sources and showing different scenes. We have demonstrated the
power of a simple strategy: rather than trying to determine/guess
the best feature set for a given classification problem, supply a
quasi-exhaustive feature set capturing image intensity and tex-
ture at multiple scales over all channels, and let the classifier pick
N eum cs) Cii
2Formally, & = 2. SO j S ? x where the cj;
Zu 5» cji)
are the entries of the confusion matrix and N is the number of pixels.
Consider an image with 10% pixels of class A and 90% pixels of class
B. A classifier which always returns B will have 90% overall accuracy,
but & = 0%.
a suitable s
efficiently c
set we prop
evaluate blc
way approx
In future wc
our huge cz
different da
rently pre-d
to extractin;
ing to test v
randomly c
object detec
Bay, H., Es:
up robust
Benbouzid,
and Kgl,
age. Jour
Bovolo, E,
nique for
tor machi
pp. 2983
Briem, G.,
ple Class
IEEE TG
Cramer, M.,
uation ov
erkundun
Dollar, P, 1
channel fi