ISPRS Commission III, Vol.34, Part 3A ,,Photogrammetric Computer Vision", Graz, 2002
3D MODELING AND REGISTRATION UNDER WIDE BASELINE CONDITIONS
L. Van Gool! ?, T. Tuytelaars!, V. Ferrari?, C. Strecha!, J. Vanden Wyngaerd!, and M. Vergauwen!
! ESAT/PSI/Visics, KULeuven, Belgium
2 D-ITET/BIWI, ETH Zurich, Switzerland
KEY WORDS: wide baseline, 3D reconstruction, 3D registration, invariant neighbourhoods
ABSTRACT
During the 90s important progess has been made in the area of structure-from-motion. From a series of closely spaced
images a 3D model of the observed scene can now be reconstructed, without knowledge about the subsequent camera
positions or settings. From nothing but a video, the camera trajectory and scene shape are extracted. Progress has also
been important in the area of structured light techniques. Rather than having to use slow and/or bulky laser scanners,
compact one-shot systems have been developed. Upon projection of a pattern onto the scene, its 3D shape and texture can
be extracted from a single image. This paper presents recent extensions on both strands, that have a common theme: how
to cope with large baseline conditions. In the case of shape-from-video we discuss ways to find correspondences and,
hence, extract 3D shapes even when the images are taken far apart. In the case of structured light, the problem solved is
how to combine partial 3D patches into complete models, without a good initialisation of their relative poses.
1 INTRODUCTION
During the last few years, low-cost and user-friendly so-
lutions for 3D modeling have become available. Shape-
from-video (Armstrong 1994, Heyden 1997, Pollefeys
1998, Hartley 2000) extracts 3D shapes and their textures
from video sequences as the only input. One-shot struc-
tured light techniques (Vuylsteke 1990, Proesmans 1996,
Chia 1996, Eyetronics www) get such information from a
single image, but need the projection of a special pattern.
These techniques have the advantage that they are cheaper
than traditional solutions like dedicated multi-camera rigs
or laser scanners, as they only require off-the-shelf hard-
ware. Moreover, they offer more flexibility in terms of
portability and the range of object sizes they can handle.
This paper presents ongoing work on two different, but
strongly related extensions of such systems.
Wide-baseline image matching: Shape-from-video
requires large overlap between subsequent frames.
Often, one would like to reconstruct from a small
number of stills, taken from very different view-
points. Based on local, viewpoint invariant features,
wide-baseline matching is made possible, and hence
the viewpoints can be farther apart.
Crude registration of 3D patches: Automatic registra-
tion algorithms for 3D patches such as ICP require
good initial, relative positions and orientations of the
patches to work. Completely automatic solutions to
the 3D puzzle of putting together a set of unstructured
3D patches requires that a first, crude registration
also takes place automatically.
2 WIDE-BASELINE IMAGE MATCHING
2.1 Task description
The 90s have witnessed the appearance of self-calibration
techniques in structure-from-motion. A series of images is
the only input such systems need to determine the camera
motion and the evolution of the camera settings, as well
as the 3D shape (up to an unknown scale) of the scene. By
now, several approaches for such self-calibration have been
developed and several systems have been proposed (Arm-
strong 1994, Heyden 1997, Hartley 2000, Pollefeys 1998).
They start with the tracking of interest points through a
sequence of views. The consistency of their image pro-
jections with a rigid 3D structure imposes constraints that
allow to extract the cameras and the 3D shape of the cloud
of interest points. The matching of these initial interest
points will be referred to as sparse correspondence search.
After the matching of the interest points, and the self-
calibration, strong multi-view constraints between the im-
ages are available. These ease the search for many more
correspondences. For one thing, a further search can be re-
stricted to epipolar lines. In our approach (Pollefeys 1998),
we go after pixelwise matches. This stage is referred to as
dense correspondence search. These additional matches
result in a detailed reconstruction of the 3D shape.
Although 3D reconstructions can in principle be made
from a limited number of stills, these systems tend to only
work effectively if the images have much overlap and are
offered in the order of a continuous camera motion This is
underlined by the name 'shape-from-video'. For instance,
we have tested our system (Pollefeys 1998) to make 3D
records of archaeological, stratigraphic layers during ex-
cavations. A large part of the scene consists of sand and
there is a general lack of points of interest. When walk-
ing around the dig, it proved necessary to take images less
than 5? apart. In such application, this is not always pos-
sible due to obstacles, and it disturbs the normal progress