The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B3b. Beijing 2008
we perform Adaboost with only 30 iterations. We checked,
that the classification error on the training samples is more
or less constant after a few iterations and decreases very
very slowly in further iterations.
We are not only interested in reasonable classification re
sults, but we also want to derive the most appropriate fea
tures. Therefore, we analyze the stability of the decisions
of the weak classifiers. We repeated our tests V times,
and then we checked the features the weak classifiers h t
use. Therefore, we determined histograms over the features
which are used by the f-th weak classifier, t = 1..30. Fur
thermore, we focused on two certain characteristics. First,
we analyzed, how many different features are used by t-th
weak classifier in V trials: ci, 1 < c\ < D. And secondly,
we evaluated, how often the best feature has been used by
f-th weak classifier in V trials: C2, 1 < C2 < V. Both
characteristics can be directly derived from the histograms:
ci is the number of histogram entries which are not 0, and
maximum frequency of a feature, C2, is the value of the
highest peak.
If the number of used features is relatively small, then the
weak classifiers always select the same features. We have
found a set of appropriate features, if it stays consequently
below a certain threshold.
5.1 Experiments on annotated regions
We selected 62 facade images from Bonn, Germany, and
Prague, Czech Republic. Together, there are 5284 anno
tated objects in these images. Over 70% of them are win
dow panes, and over 20% of the annotated objects are win
dows.
5.1.1 Facade, Roof, Sky and Vegetation The first ex
periment included 35 annotated facades, 37 roofs, 47 times
sky and 70 times vegetation, i. e. 189 samples in total.
Since the data set is quite small, we were able to perform
four Leave-one-out-tests, i. e. V = 189, with the samples
of one class as foreground and the samples from the other
three classes as background. The classification errors are
shown in tab. 2.
Table 2: Error rates of manually annotated regions.
facade
roof
sky
vegetation
9.5%
48.1%
1.1%
11.1%
In tab. 3 and fig. 3, we present the histograms over the
features which are used by the f-th weak classifier regard
ing the classification of facades and roofs. With respect to
the classification of facades, we notice that the first weak
classifiers hi to /14 distinguish facades from background
by using nearly the same features over all 189 trials. Thus,
the set of appropriate features should certainly contain the
features /3, /4, / 6 and /163. Then, number of used fea
tures, ci, increases for the further weak classifiers. Finally,
in the 30-th iteration ci reaches a value of 21, i. e. the weak
classifiers h3o of all 189 trials use 21 different features. In
91 cases, the /3 was chosen, and /1 was chosen in 35 cases,
again. All other features which are used by /130 only play a
minor role.
Table 3: Used features and maximum frequency of a fea
ture with respect to the weak classifiers h t for classifying
facades in 189 trials.
hi
hz
hz
hzo
h : 187
/3:2
/4 : 188
/i:l
/144 : 174
/141 : 8
/a:91
/1 :35
Ci = 2
c 2 = 187
ci = 2
c 2 = 188
Ci = 6
C2 = 174
ci = 21
c 2 = 91
The almost perfect classification of sky relies on the fea
tures /9, /151 and /130. Here, even the last weak classifier
hzo choses in 164 or 87% of all cases the feature /129, all
other features which are used by fi.30 can get neglected. Re
garding the classification of vegetation, the first weak clas
sifiers mainly use the features /9, /5 and /19. The devel
opment of the histograms of the further weak classifiers is
similar to the observation which we made after the facade
classification.
The classification results of roofs is inacceptable, and here
the selection of appropriate feature is not so stable as in the
other three tests. Although the first weak classifiers always
uses /15 as the discriminative feature, all further weak clas
sifiers vary much more with respect to the selected feature.
We show the histograms of four weak classifiers in fig. 3.
ci = 1 ci = 4 Ci — 6 ci = 19
c 2 = 189 c 2 = 140 c 2 = 123 c 2 = 47
Figure 3: Histograms over the features of each weak clas
sifier h t and the number of used features ci and the maxi
mum frequency C2 from classifying roofs.
Due to the limited space, we do not present additional his
tograms in detail. But the plots of fig. 4 shows how the the
number of used features ci and the maximum frequency C2
develop with respect to the classification target. Comparing
the curves of C2 with respect to roof and sky, it decreases
much faster with bad classification results.
5.1.2 Window and Window Panes In the other two ex
periments, we tested the classification of windows and win
dow panes. Since we could use 5284 annotated regions, we
performed a cross validation test, where we used 10% of
the regions for testing and the rest for training the classi
fiers. Then, we repeated all tests V = 20 times. The results
are presented in tab. 4.
Table 4: Error rates of manually annotated regions.
class
min
median
max
total
window
15.4%
24.7%
75.3%
41.8%
window pane
3.1%
6.0%
70.0%
10.1%
The classification results of the windows and window panes
make us very optimistic that we might get reasonable sets
of appropriate features. Perturbingly, the classification er
rors vary too much between the 20 different tests. Further
more, we cannot find any correlation between the classifica
tion results and the stability of the weak classifierss choice