ISPRS Commission III, Vol.34, Part 3A ,,Photogrammetric Computer Vision", Graz, 2002
m n
Figure 1: Two images of the same scene, but taken from
very different viewing directions.
of the excavations, as the image acquisition takes too much
time, even when the images are taken in the form of a video
sequence. It would be very advantageous, if the number of
images can be limited to about 10 or so. These images
would still cover the whole scene, but would be taken from
substantially different viewpoints. Such ‘wide baseline’
images could also be taken with a digital photo camera
rather than a video camera, leading to higher resolution
imagery.
In summary, extending the shape-from-video technique
to wide baseline conditions implies that both the sparse
and the dense correspondence search have to be success-
ful on images taken from very different viewpoints. The
self-calibration procedure itself remains essentially iden-
tical. In our system, this is primarily based on the abso-
lute quadric approach proposed by Triggs (Triggs 1997).
Next, we describe the adapted versions of the correspon-
dence steps.
2.2 Approach for sparse correspondence search
Consider the wide baseline image pair of fig. 1. The two
images have been taken from very different viewing direc-
tions. Stereo and shape-from-video systems will most of-
ten not even get started in such cases, as correspondences
are difficult to find.
As already mentioned, the shape-from-video approach
splits the correspondence problem into two stages. The
first stage determines correspondences for a relatively
sparse set of features, usually corners. In the shape-from-
video technique, the matching of corners is based on look-
ing for corners within a region around the same position
in the other image, and a selection on the basis of a nor-
malised cross-relation of the surrounding intensity pat-
terns. Both parts of this strategy will fail under the in-
tended wide baseline conditions. The corresponding point
may basically lie anywhere in the other image, and will not
be found close to its original position. The use of simple
cross-correlation will not suffice to cater for the change in
corner patterns due to stronger changes in viewpoint and
illumination. The next paragraphs describe an alternative
strategy, that is better suited.
When looking for initial features to match, we should fo-
cus on local structures. Otherwise, occlusions and chang-
ing backgrounds will cause problems, certainly under wide
baseline conditions. Here, we look at small regions, con-
structed around or near interest points. If these regions
are to be matched, they ought to cover the same part of
the scene in the different views. Hence, they have to take
on different shapes in the different images. The most im-
portant aspect of the strategy proposed here is that the re-
gion extraction works on the basis of individual images, i.e.
without any knowledge about the other images. This prop-
erty is key to avoiding a slow and combinatoric search for
matches. In the proposed scheme regions are constructed
in one go based on a single image, instead of by selecting
a region in one image and then trying to find a match by
deforming and relocating a region in the other image un-
til some matching score surpasses a threshold. Here, the
corresponding region in the second image is extracted in-
dependently, before one even attempts to match regions.
The crux of the matter is that every step in the region ex-
traction is invariant under the image variations one wants
to be robust against. This is discussed in more detail next.
On the one hand the viewpoint may strongly change.
Hence, the extraction has to survive affine deformations
of the regions, not just in-plane rotations and translations.
In fact, affine transformations also not fully cover the ob-
served changes. This model will only suffice for regions
that are sufficiently small and planar. We assume that a
reasonable number of such regions will be found, an ex-
pectation borne out in practice. On the other hand, strong
changes in illumination conditions may occur between the
views. The chance of this happening will actually grow
with the angle over which the camera rotates. The relative
contributions of light sources will change more than in the
frame-to-frame changes in a video. We model the effects
of changing illumination by scaling the three colour bands
(R, G, B) with different scale factors and by adding dif-
ferent offsets. Our local feature extraction should also be
immune against such photometric changes.
If we want to construct regions that are in correspondence
Fig
cor
but
out
irre
pen
Vari
fort
con
the
the
tior
exti
den
mat
trac
feat
the
moi
Fig
for
hoo
hav
Als
ma
este