Full text: Papers accepted on the basis of peer-review full manuscripts (Part A)

ISPRS Commission III, Vol.34, Part 3A ,,Photogrammetric Computer Vision", Graz, 2002 
m n 
Figure 1: Two images of the same scene, but taken from 
very different viewing directions. 
of the excavations, as the image acquisition takes too much 
time, even when the images are taken in the form of a video 
sequence. It would be very advantageous, if the number of 
images can be limited to about 10 or so. These images 
would still cover the whole scene, but would be taken from 
substantially different viewpoints. Such ‘wide baseline’ 
images could also be taken with a digital photo camera 
rather than a video camera, leading to higher resolution 
In summary, extending the shape-from-video technique 
to wide baseline conditions implies that both the sparse 
and the dense correspondence search have to be success- 
ful on images taken from very different viewpoints. The 
self-calibration procedure itself remains essentially iden- 
tical. In our system, this is primarily based on the abso- 
lute quadric approach proposed by Triggs (Triggs 1997). 
Next, we describe the adapted versions of the correspon- 
dence steps. 
2.2 Approach for sparse correspondence search 
Consider the wide baseline image pair of fig. 1. The two 
images have been taken from very different viewing direc- 
tions. Stereo and shape-from-video systems will most of- 
ten not even get started in such cases, as correspondences 
are difficult to find. 
As already mentioned, the shape-from-video approach 
splits the correspondence problem into two stages. The 
first stage determines correspondences for a relatively 
sparse set of features, usually corners. In the shape-from- 
video technique, the matching of corners is based on look- 
ing for corners within a region around the same position 
in the other image, and a selection on the basis of a nor- 
malised cross-relation of the surrounding intensity pat- 
terns. Both parts of this strategy will fail under the in- 
tended wide baseline conditions. The corresponding point 
may basically lie anywhere in the other image, and will not 
be found close to its original position. The use of simple 
cross-correlation will not suffice to cater for the change in 
corner patterns due to stronger changes in viewpoint and 
illumination. The next paragraphs describe an alternative 
strategy, that is better suited. 
When looking for initial features to match, we should fo- 
cus on local structures. Otherwise, occlusions and chang- 
ing backgrounds will cause problems, certainly under wide 
baseline conditions. Here, we look at small regions, con- 
structed around or near interest points. If these regions 
are to be matched, they ought to cover the same part of 
the scene in the different views. Hence, they have to take 
on different shapes in the different images. The most im- 
portant aspect of the strategy proposed here is that the re- 
gion extraction works on the basis of individual images, i.e. 
without any knowledge about the other images. This prop- 
erty is key to avoiding a slow and combinatoric search for 
matches. In the proposed scheme regions are constructed 
in one go based on a single image, instead of by selecting 
a region in one image and then trying to find a match by 
deforming and relocating a region in the other image un- 
til some matching score surpasses a threshold. Here, the 
corresponding region in the second image is extracted in- 
dependently, before one even attempts to match regions. 
The crux of the matter is that every step in the region ex- 
traction is invariant under the image variations one wants 
to be robust against. This is discussed in more detail next. 
On the one hand the viewpoint may strongly change. 
Hence, the extraction has to survive affine deformations 
of the regions, not just in-plane rotations and translations. 
In fact, affine transformations also not fully cover the ob- 
served changes. This model will only suffice for regions 
that are sufficiently small and planar. We assume that a 
reasonable number of such regions will be found, an ex- 
pectation borne out in practice. On the other hand, strong 
changes in illumination conditions may occur between the 
views. The chance of this happening will actually grow 
with the angle over which the camera rotates. The relative 
contributions of light sources will change more than in the 
frame-to-frame changes in a video. We model the effects 
of changing illumination by scaling the three colour bands 
(R, G, B) with different scale factors and by adding dif- 
ferent offsets. Our local feature extraction should also be 
immune against such photometric changes. 
If we want to construct regions that are in correspondence 

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.