approach
es. The
'elatively
pe-from-
on look-
position
of a nor-
sity pat-
r the in-
ing point
1 will not
f simple
hange in
oint and
ternative
ould fo-
d chang-
der wide
ns, con-
regions
> part of
e to take
nost im-
it the re-
ages, i.e.
his prop-
(arch for
1istructed
selecting
natch by
nage un-
lere, the
acted in-
regions.
gion ex-
ne wants
ail next.
change.
rmations
slations.
r the ob-
regions
1e that a
|, an ex-
], strong
ween the
lly grow
> relative
an in the
e effects
ur bands
ling dif-
| also be
ondence
ISPRS Commission III, Vol.34, Part 3A ,,Photogrammetric Computer Vision“, Graz, 2002
BE san,
Figure 2: ‘invariant neighbourhoods’ that were extracted
for the images in fig. 1. Only regions are shown for which a
corresponding partner in the other image has been found,
but the regions in the two images have been extracted with-
out knowledge about the other image.
irrespective of these changes and that are extracted inde-
pendently, every step in their construction ought to be in-
variant under both the geometric and photometric trans-
formations just described. A detailed description of these
construction methods is out of the scope of this paper, and
the interested reader is referred to papers specialised on
the subject (Tuytelaars 1999, Tuytelaars 2000). As men-
tioned before, these constructions allow the computer to
extract the regions in the two views completely indepen-
dently. After they have been constructed, they can be
matched efficiently on the basis of features that are ex-
tracted from the colour patterns that they enclose. These
features again are invariant under both the geometric and
the photometric transformations considered. To be a bit
more precise, a feature vector of moment invariants is used.
Fig. 2 shows some of the regions that have been extracted
for fig. 1. We refer to the regions as ‘invariant neighbour-
hoods’. Recently, several additional construction methods
have been proposed by other researchers (Baumberg 2000,
Matas 2001).
Also under the wide baseline version of shape-from-video,
maybe better referred to as ‘shape-from-stills’, one is inter-
ested in finding correspondences between more than two
Figure 3: Top row: views 1 and 2 of a bookshelf scene, with
the 47 invariant neighbourhoods that have been matched
indicated. Bottom row: the 41 matched invariant neigh-
bourhoods for views 1 and 3 of the same scene.
images. The previously described wide-baseline stereo
matching approach is well suited for producing many fea-
ture matches between pairs of views that may be quite dif-
ferent. In practice, it actually is far from certain that the
corresponding feature in another view will also be con-
structed by the system. Hence, the probability of extracting
all correspondences for a feature in all views of an image
set quickly decreases with the amount of views. More-
over, there is a chance of matching wrong features. For in-
stance, let us suppose we are given 3 views vi, v» and vs.
Although the method may find matches between the view
pair (1, 2) and also between the view pair (1, 3), these two
sets of matches will often substantially differ and a small
number of common features between all three views may
result. Figure 3 shows 3 views and the matches found be-
tween the pairs (1, 2) and (1, 3). Fig. 4 shows the matches
that these pairs have in common. Whereas more than 40
matches were found between the pairs of fig. 3, the number
of matches between all three views has dropped sharply, to
only 16. When we consider 4 or 5 views, the situation can
deteriorate further, and only a few, if any, features may be
put in correspondence among all the views (even though
there may be sufficient overlap between all the views).
Our most recent developments are devoted to counteract
this problem. The approach is founded on two main ideas.
Firstly, it is possible to exploit the information supplied
by a correct match in order to generate many other correct
matches. Suppose there is a feature A; in view v; which
is matched to its corresponding feature A, in view v», and
a feature B, in v4 which could not get matched to its cor-
responding feature in v» (eg: the corresponding invariant
neighbourhood B» has not been extracted, or maybe it has
been extracted but the matching failed). If B4, and A, are
spatially close and lie on the same physical surface, then
they will probably be mapped to v» by similar affine trans-
formations. Hence, we can project B» in v» via the affine