401
Beijing 2008
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Voi. XXXVII. Part B3b. Beijing 2008
; features joint
omatically es-
ed super class
c, which then
luster.
Itidimensional
:e of the class,
is few training
he p. d. f. be-
xluce a chain-
minimize the
ces need to be
e instances of
lave a similar-
. g. replacing
e of the corre-
;uration of the
nee Start-
vector xo we
date instances
sn low thresh-
rch for all in-
ursion allows
but possible
the complete
ìes of We
not deal with
ss-correlation
deal with in-
:d no rotation
ize a x b we
:alized by the
x(r, c))] be-
a x b within
itions as can
al maxima in
i-suppression
j) defines the
hes ^ of size
e the first se-
connected by
h their corre-
ij. For all of
. The example
these found image patches we repeat this procedure recursively.
The depth ’rec’ of recursion is a parameter, which we choose per
default as rec = 3 but can be set by the user. The procedure is
summarised in Alg. 1.
Algorithm 1 Recursive buildup of similarity graph
Require: rec, Ti,I
Ensure: (/('T', £,J>)
1: R = [p(x 0 ,x(i,j))] Vi,j
2: X = non-maximum-suppression(F?, Tj)
3: add as and X to V and establish £ and S = (p(3S), A)}
4: depth of recursion = 1
5: while depth of recursion < rec do
6: for k = l,...,length(X) do
7: R = \p(x k ,x(i,j))] Vi, j
8: X = non-maximum-suppression(/?, Tj)
9: add XtoV and update £ and S = |p(ai), A)}
10: end for
11: end while
Unsupervised clustering using a similarity graph Finding
clusters is based on a reduced similarity graph. Buhmann and
Hofmann (1994) study the pairwise clustering problem in the
framework of maximum entropy estimation. They assume only
pairwise dissimilarity values are available and achieve grouping
the data into clusters by minimising the sum of dissimilarities be
tween data of the same cluster. In (Hofmann and Buhmann, 1997)
they upgrade their method by a deterministic annealing approach
and use pairwise data clustering to segment textured images. We
in a first instance perform the clustering by simple thresholding,
which is certainly suboptimal, but may be replaced by a more
rigorous procedure. Therefore, we cut all edges with weights
p < Tj and derive connected components of the graph. So the
vertices v% G V of the reduced similarity graph (f (V£,5)
represent instances tq of class c G C. Two vertices Vi and Vj are
connected in case the similarity s(as, 7(f) — p(xi,xf > T-2 is
larger than a threshold Tj, thus the existence of indicates a
high similarity. As this threshold T2 is chosen to be larger than
T\ not all vertices in G are connected. We assume instances in
one connected component to belong to the same subclass.
Per default we choose the correlation coefficients Tl = 0.6 and
T2 = 0.8 as thresholds. We are currently investigating the ade
quate choice of these parameters.
Supervised labelling of found clusters The identification of
the subclasses is based on the user’s judgement. For each cluster
the user has to specify whether it represents a subclass of the class
envisaged, another class not yet envisaged, or the rejection class.
The user thus establishes a simple class hierarchy with multiple
classes Ck, including the rejection class, with the subclasses Cki-
E. g. we may take the class window as the envisaged class, man
ually identify a window by its bounding box and - as an example
- may have found four clusters, which turn out to be of different
type, e. g. windows with and without crossbar and with a round
top, and another cluster of balconies, see Fig. 2(a).
Representation of prototypes We assume that our clustering
and labelling process generates well closed clusters for each class.
Thus we may assume the p. d. f. to have one mode, or at maxi
mum a few. We might use the mean feature vector as prototype
and characterize it by its covariance matrix. The chosen algo
rithms for finding prototypes need to be seen within the complete
scheme of incremental learning, which poses restrictions on the
representation which need to be updatable incrementally.
We determine a prototype by finding a representative instance as
a function of the found instances. In our first realization we use
the mean images as prototypes for each class. So we define the
prototype p c of each class c G C as
1 Nc
Pc — X — X/ Xn ^
C 71=1
where c Xi are the feature vector of instances c 35 of class C and
N c is their number.
To achieve invariance on different object sizes we scale our pro
totypes to a given fixed size.
Experiments with getting prototypes Fig. 2(a) shows a fa
cade with four floors and five window columns. We observe
two classes of facade elements: windows and balconies. The
windows can be partitioned into four subclasses. We add the
cornices to the windows as a characteristic feature. We chose
a bounding box around the second window on the third floor to
define a single example. To reduce the computational costs we re
sized the image, so that the templates got size 50 x 70 pixels. We
took our default parameter set with recursion depth of three and
Ti = 0.6 and T2 = 0.8 as thresholds for correlation coefficients.
Fig. 2(b) shows the mean images of found clusters together with
labels given by the user manually, see figure caption for explana
tion the numbers. Obviously, the joint p. d. f. of features has such
a form that the chaining effect allows a drift of the found image
patches to different clusters. The third found cluster (third image
at first line) shows that the cornices were identified as a cluster
of multiple occurring elements. Indeed, this element is the most
frequent element in image. But as we are not interested in such
objects we chose this cluster to be background. As a further ef
fect of the chaining effect we reached the desired drift to not only
different types of the same super class but also to a new super
class. Hence, the process identified two new window subclasses.
And as the balconies are very similar to the windows, the process
also identified them as related class.
Fig. 2(c) shows found image patches marked with their label.
Nearly all interesting instances were found excepting three win
dows on the upper floor. The rectangular windows without cor
nice could not be distinguished from those with cornices. But all
others were correctly found and matched to their class. Finally,
Fig. 2(d) shows the corresponding object hierarchy with classes
represented by their prototypes.
4.2 Classification
Models for classification can be generative or discriminative.
Generative models capture the essence of the class explicitly and
allow sampling of instances, e. g. feature vectors from their com
pactly represented distribution. Incremental learning of genera
tive models is incrementally changing the class description. Dis
criminative models explicitly represent differences between two
or more classes. They make the decision process explicit. Incre
mental learning is changing the decision rules, e. g. the decision
function. Incremental learning is much easier in generative mod
els. The efficiency of discriminative models in general is much
higher. This is the reason why both representations should be
integrated, cf. (Skocaj et al., 2006). In a first step towards incre
mental learning we follow the approach of Fidler (2006) and use
augmented PCA as generative model for our classes and LDA for
representing the decision functions.
This procedure is called LDA on augmented PCA (LDAaPCA)
and as result we obtain feature vectors y for every image patch in
the final classification subspace which is of dimension C — 1.