• Far away scenes like landscapes, are completed with
image-based rendering (IBR) or panoramas. This serves
mainly to present the monument in its natural setting.
This combination of techniques will satisfy most requirements
except that, at least for now, the cost is not as low as a fully
image-based system.
The remainder of the paper is organized as follows. In section 2,
an overview of 3D reconstruction techniques is presented. This
will lead to a deduction in the third section that a combination
of techniques is the logical answer to acquiring all the necessary
details. This is followed by the proposed approach in section 4.
Section 5 describes the modeling of the Abbey of Pomposa
using this multi-technique approach. The paper concludes with
a short discussion in section 6.
2. OVERVIEW OF 3D CONSTRUCTION TECHNIQUES
A standard approach to create a model is to build it from scratch
using tools, such as CAD software, that offer building blocks in
the form of primitive 3D shapes. Some surveying data, or
measurements from drawings and maps will also be required.
This geometry-based modeling technique is obviously time and
effort consuming and impractical and costly for large-scale
projects. The created model also has a computer-generated look
rather than realistic look and does not include fine details or
irregular and sculpted surfaces. Currently efforts are directed
towards increasing the level of automation and realism by
starting with actual images of the object or directly digitizing it
with a laser scanner. Here is a summary of recent techniques.
2.1 Image-Based Modeling
Image based modeling entails widely available hardware and
potentially the same system can be used for a wide range of
objects and scenes. They are also capable of producing realistic
looking models and those based on photogrammetry have high
geometric accuracy. Three-dimensional measurement from
images naturally requires that interest points or edges be visible
in the image. This is often not possible either because a region
is hidden or occluded behind an object or a surface, or because
there is no mark, edge, or visual feature to extract. In objects
such as monuments in their normal settings we are also faced
with the restrictions of limited locations from which the images
can be taken as well as the existence of other objects, shadows
and illumination.
The ultimate goal of all 3D reconstruction methods is to satisfy
the eight requirements listed in the previous section. Since this
is not easy, they focus on some of the tasks at the expense of the
others. Efforts to increase the level of automation became
essential in order to widen the use of the technology. However,
efforts to completely automate the process from taking images
to the output of a 3D model, while promising, are thus far not
always successful. The automation of camera pose estimation
and computation of pixel 3D coordinates will be summarized.
This procedure, which is now widely used in computer vision
[e.g. Faugeras et al, 1998, Fitzgibbon et al, 1998, Pollefeys et
al, 1999, Liebowitz, et al, 1999], starts with a sequence of
images taken by un-calibrated camera. The system extracts
interest points, like comers, sequentially matches them across
views, then computes camera parameters and 3D coordinates of
the matched points using robust techniques. The first two
images are usually used to initialize the sequence. It is
important that the points are tracked over a long sequence to
reduce the error propagation. This is all done in a projective
geometry basis and is usually followed by a bundle adjustment,
also in the projective space. Self-calibration to compute the
intrinsic camera parameters, usually only the focal length,
follows in order to obtain metric reconstruction, up to scale,
from the projective one [Pollefeys et al, 1999]. Again, bundle
adjustment is usually applied to the metric construction to
optimize the solution. The next step, the creation of the 3D
model, is more difficult to automate and is usually done
interactively to define the topology and edit or post process the
output. For large structures and scenes, since the technique may
require a large number of images, the creation of the model
requires a significant human interaction regardless of the fact
that image registration and a large number of 3D points were
computed fully automatically.
The most impressive results remain to be those achieved with
highly interactive approaches. Rather than full automation, an
easy to use hybrid system known as Façade has been developed
[Debevec et al, 1996]. The method’s main goal is the realistic
creation of 3D models of architectures from small number of
photographs. The basic geometric shape of the structure is first
recovered interactively using models of polyhedral elements. In
this step, the actual size of the elements and camera pose are
captured assuming that the camera intrinsic parameters are
known. The second step is an automated matching procedure,
constrained by the now known basic model, to add geometric
details. The approach proved to be effective in creating
geometrically accurate and realistic models of architectures.
The drawback is the high level of interaction and the
restrictions to certain shapes. Also since assumed shapes
determine all 3D points and camera poses, the results are as
accurate as the assumption that the structure elements match
those shapes. Our method, although similar in philosophy,
replaces basic shapes with a small number of seed points to
achieve more flexibility and higher level of details. In addition,
the camera poses and 3D coordinates are determined without
any assumption of the shapes but instead by a full bundle
adjustment, with or without self-calibration depending on the
given configuration. This achieves higher geometric accuracy
independent from the shape of the object.
The Façade approach has inspired several research activities to
automate it. Werner and Zisserman, 2002, proposed a fully
automated Façade-like approach. Instead of the basic shapes,
the principal planes of the scene are created automatically to
assemble a coarse model. Like Façade, the coarse model guides
a more refined polyhedral model of details such as windows,
doors, and wedge blocks. Since this is a fully automated
approach, it requires feature detection and closely spaced
images matching and camera pose estimation using projective
geometry. Dick et al, 2001, proposed another automated
Façade-like approach. It uses model-based recognition to
extract high-level models in a single image then project them
into other images for verification. The method requires
parameterized building blocks with a priori distribution defined
by the building style. The scene is modeled as base planes
corresponding to walls or roofs; each may contain offset 3D
shapes that model common architecture elements such as
windows and columns. Again, the full automation necessitates
feature detection and projective geometry approach.
2.2 Range-Based Modeling
As mentioned above, three-dimensional measurement from
images requires that interest points or edges be visible in the
image, which is not always possible. They are also affected by