Learning from few examples, usually known as one shot-learning
is well investigated, cf. e.g. (Fei-Fei et al., 2004) and (Li et al.,
2007). Fei-Fei et al. present a method for learning object cat
egories from just a few training examples and obtain the prior
knowledge from object categories which were previously learnt.
In contrast to their approach, we explicitly use the prior knowl
edge that the object appears repeatedly to find new instances and
learn the class appearance from these instances. Given the knowl
edge of existence of similar but different object classes we make
hypothesis of new object classes or subclasses which might be
accepted from the user or not.
The idea of using similarities between neighbouring objects is
not totally new. Sivic and Zisserman (2006) aim at finding key
persons in a video. They propose to initiate the learning of the
appearance model by identifying the person in a single image and
use a tracking procedure to generate a large set of training sam
ples. Thus the inherent similarity graph is built up using pairwise
similarities and explore the chaining effect to obtain dissimilar
instances of the person.
Van Gool et al. (2007) and his group aim at estimating vanishing
points in strong perspective facade images. They search for rows
and columns of similar image patches. They first establish a sim
ilarity graph between interesting image features, from which they
derive a similarity chain using a maximum spanning tree (MSP).
Consecutive nodes in the MSP are then taken to be image features
which are arranged in rows or columns, then allowing to estimate
the vanishing points.
Our method of finding prototypes is comparable to that of Sivic
and Zisserman (2006): As our goal is to automatically derive a
hierarchy for appearance classes, we provide one or several in
stances of the class to be modelled and let the system find as
many prototypes for that class.
4 A CONCEPT FOR INCREMENTAL LEARNING
In this section we want to describe our concept for incremental
learning in more detail. We start with the determination of proto
types.
A prototype is a representative instance of its class. An instance
is representative if it may be taken as surrogate for the probabil
ity density function (p. d. f.) of the class during classification,
as e. g. in case based classification. Therefore one might need
several prototypes being representative for a class. For choosing
prototypes one could use characteristic points of the features joint
p. d. f., like the mode or the mean.
Thus the goal of the following algorithm is to automatically es
tablish a large set X of instances ^ of the envisaged super class
and by clustering identify appearance subclasses c, which then
are the basis to determine a prototype p c for each cluster.
We assume the class to be representable by the multidimensional
p. d. f. of the feature vector describing each instance of the class.
As we want to initiate the prototype detection with as few training
data as possible we assume the density values of the p. d. f. be
tween neighbouring nodes to be large enough to produce a chain
ing effect. The chaining effect thus is exploited to minimize the
required user guidance. Otherwise, multiple instances need to be
given as training samples.
Our method assumes the image to contain multiple instances of
the envisaged concept. In addition, we assume to have a similar
ity measure s(^, %j) for two different instances, e. g. replacing
the probability density p(xi — Xj) of the difference of the corre
spondent feature vectors.
For the moment we do not exploit the spatial configuration of the
instances.
Finding a large subset X of X° from a given instance Start
ing from an initial instance ?&(xo) with feature vector xq we
search for all instances ^ in the set X° of all candidate instances
having a similarity s(a?o, Xi) > T\ better than a given low thresh
old T\. Starting from these instances we again search for all in
stances with a similarity better than T\. This recursion allows
to reach all instances bridged by the chaining effect, but possible
also instances not belonging to the envisaged class.
At the moment we represent each instance ^ by the complete
intensity matrix, thus Xi contains all intensity values of We
work on rectified metric facade images so we need not deal with
perspective effects. We choose the normalised cross-correlation
coefficient p as our similarity measure, as it can deal with in
tensity changes (e. g. due to shadows) and we need no rotation
invariance. So s (as, \j) = p (xi, Xj).
Given an image / and a single example 1 of size a x b we
scan the whole image for similar objects. This is realized by the
cross-correlation function p with p(r, c) = [p(xo, x(r, c))] be
tween given template tco and image patches of size a x b within
the whole image I. Thus we formally take all positions as can
didate set X°. In the first iteration we select all local maxima in
p with p(i,j) > Xj, cf. the function non-maximum-suppression
in line 1.2 and 1.8 of Alg. 1. The given example defines the
seed node zb of a graph ^(X', £; S) and image patches ^ of size
a x b around positions of found local maxima define the first se
quence of additional vertices &(%). All of them are connected by
an edge dj = (^, Xj) to the seed node together with their corre
lation coefficient p (xq, Xi) as similarity measure Sij. For all of
1 The enlargement to more examples is straightforward. The example
could be given by the user or by an appropriate detector.
400