Proceedings; XXI International Congress for Photogrammetry and Remote Sensing: Proceedings

chen, jun
Learning from few examples, usually known as one shot-learning 
is well investigated, cf. e.g. (Fei-Fei et al., 2004) and (Li et al., 
2007). Fei-Fei et al. present a method for learning object cat 
egories from just a few training examples and obtain the prior 
knowledge from object categories which were previously learnt. 
In contrast to their approach, we explicitly use the prior knowl 
edge that the object appears repeatedly to find new instances and 
learn the class appearance from these instances. Given the knowl 
edge of existence of similar but different object classes we make 
hypothesis of new object classes or subclasses which might be 
accepted from the user or not. 
The idea of using similarities between neighbouring objects is 
not totally new. Sivic and Zisserman (2006) aim at finding key 
persons in a video. They propose to initiate the learning of the 
appearance model by identifying the person in a single image and 
use a tracking procedure to generate a large set of training sam 
ples. Thus the inherent similarity graph is built up using pairwise 
similarities and explore the chaining effect to obtain dissimilar 
instances of the person. 
Van Gool et al. (2007) and his group aim at estimating vanishing 
points in strong perspective facade images. They search for rows 
and columns of similar image patches. They first establish a sim 
ilarity graph between interesting image features, from which they 
derive a similarity chain using a maximum spanning tree (MSP). 
Consecutive nodes in the MSP are then taken to be image features 
which are arranged in rows or columns, then allowing to estimate 
the vanishing points. 
Our method of finding prototypes is comparable to that of Sivic 
and Zisserman (2006): As our goal is to automatically derive a 
hierarchy for appearance classes, we provide one or several in 
stances of the class to be modelled and let the system find as 
many prototypes for that class. 
4 A CONCEPT FOR INCREMENTAL LEARNING 
In this section we want to describe our concept for incremental 
learning in more detail. We start with the determination of proto 
types. 
A prototype is a representative instance of its class. An instance 
is representative if it may be taken as surrogate for the probabil 
ity density function (p. d. f.) of the class during classification, 
as e. g. in case based classification. Therefore one might need 
several prototypes being representative for a class. For choosing 
prototypes one could use characteristic points of the features joint 
p. d. f., like the mode or the mean. 
Thus the goal of the following algorithm is to automatically es 
tablish a large set X of instances ^ of the envisaged super class 
and by clustering identify appearance subclasses c, which then 
are the basis to determine a prototype p c for each cluster. 
We assume the class to be representable by the multidimensional 
p. d. f. of the feature vector describing each instance of the class. 
As we want to initiate the prototype detection with as few training 
data as possible we assume the density values of the p. d. f. be 
tween neighbouring nodes to be large enough to produce a chain 
ing effect. The chaining effect thus is exploited to minimize the 
required user guidance. Otherwise, multiple instances need to be 
given as training samples. 
Our method assumes the image to contain multiple instances of 
the envisaged concept. In addition, we assume to have a similar 
ity measure s(^, %j) for two different instances, e. g. replacing 
the probability density p(xi — Xj) of the difference of the corre 
spondent feature vectors. 
For the moment we do not exploit the spatial configuration of the 
instances. 
Finding a large subset X of X° from a given instance Start 
ing from an initial instance ?&(xo) with feature vector xq we 
search for all instances ^ in the set X° of all candidate instances 
having a similarity s(a?o, Xi) > T\ better than a given low thresh 
old T\. Starting from these instances we again search for all in 
stances with a similarity better than T\. This recursion allows 
to reach all instances bridged by the chaining effect, but possible 
also instances not belonging to the envisaged class. 
At the moment we represent each instance ^ by the complete 
intensity matrix, thus Xi contains all intensity values of We 
work on rectified metric facade images so we need not deal with 
perspective effects. We choose the normalised cross-correlation 
coefficient p as our similarity measure, as it can deal with in 
tensity changes (e. g. due to shadows) and we need no rotation 
invariance. So s (as, \j) = p (xi, Xj). 
Given an image / and a single example 1 of size a x b we 
scan the whole image for similar objects. This is realized by the 
cross-correlation function p with p(r, c) = [p(xo, x(r, c))] be 
tween given template tco and image patches of size a x b within 
the whole image I. Thus we formally take all positions as can 
didate set X°. In the first iteration we select all local maxima in 
p with p(i,j) > Xj, cf. the function non-maximum-suppression 
in line 1.2 and 1.8 of Alg. 1. The given example defines the 
seed node zb of a graph ^(X', £; S) and image patches ^ of size 
a x b around positions of found local maxima define the first se 
quence of additional vertices &(%). All of them are connected by 
an edge dj = (^, Xj) to the seed node together with their corre 
lation coefficient p (xq, Xi) as similarity measure Sij. For all of 
1 The enlargement to more examples is straightforward. The example 
could be given by the user or by an appropriate detector. 
400
1
2
...
27
28
29
30
31
...
396
397
Full text: Proceedings (Part B3b-2)

Access restriction

Copyright

Note to user