Proceedings of the CIPA WG 6 International Workshop on Scanning for Cultural Heritage Recording

böhler, wolfgang
• Far away scenes like landscapes, are completed with 
image-based rendering (IBR) or panoramas. This serves 
mainly to present the monument in its natural setting. 
This combination of techniques will satisfy most requirements 
except that, at least for now, the cost is not as low as a fully 
image-based system. 
The remainder of the paper is organized as follows. In section 2, 
an overview of 3D reconstruction techniques is presented. This 
will lead to a deduction in the third section that a combination 
of techniques is the logical answer to acquiring all the necessary 
details. This is followed by the proposed approach in section 4. 
Section 5 describes the modeling of the Abbey of Pomposa 
using this multi-technique approach. The paper concludes with 
a short discussion in section 6. 
2. OVERVIEW OF 3D CONSTRUCTION TECHNIQUES 
A standard approach to create a model is to build it from scratch 
using tools, such as CAD software, that offer building blocks in 
the form of primitive 3D shapes. Some surveying data, or 
measurements from drawings and maps will also be required. 
This geometry-based modeling technique is obviously time and 
effort consuming and impractical and costly for large-scale 
projects. The created model also has a computer-generated look 
rather than realistic look and does not include fine details or 
irregular and sculpted surfaces. Currently efforts are directed 
towards increasing the level of automation and realism by 
starting with actual images of the object or directly digitizing it 
with a laser scanner. Here is a summary of recent techniques. 
2.1 Image-Based Modeling 
Image based modeling entails widely available hardware and 
potentially the same system can be used for a wide range of 
objects and scenes. They are also capable of producing realistic 
looking models and those based on photogrammetry have high 
geometric accuracy. Three-dimensional measurement from 
images naturally requires that interest points or edges be visible 
in the image. This is often not possible either because a region 
is hidden or occluded behind an object or a surface, or because 
there is no mark, edge, or visual feature to extract. In objects 
such as monuments in their normal settings we are also faced 
with the restrictions of limited locations from which the images 
can be taken as well as the existence of other objects, shadows 
and illumination. 
The ultimate goal of all 3D reconstruction methods is to satisfy 
the eight requirements listed in the previous section. Since this 
is not easy, they focus on some of the tasks at the expense of the 
others. Efforts to increase the level of automation became 
essential in order to widen the use of the technology. However, 
efforts to completely automate the process from taking images 
to the output of a 3D model, while promising, are thus far not 
always successful. The automation of camera pose estimation 
and computation of pixel 3D coordinates will be summarized. 
This procedure, which is now widely used in computer vision 
[e.g. Faugeras et al, 1998, Fitzgibbon et al, 1998, Pollefeys et 
al, 1999, Liebowitz, et al, 1999], starts with a sequence of 
images taken by un-calibrated camera. The system extracts 
interest points, like comers, sequentially matches them across 
views, then computes camera parameters and 3D coordinates of 
the matched points using robust techniques. The first two 
images are usually used to initialize the sequence. It is 
important that the points are tracked over a long sequence to 
reduce the error propagation. This is all done in a projective 
geometry basis and is usually followed by a bundle adjustment, 
also in the projective space. Self-calibration to compute the 
intrinsic camera parameters, usually only the focal length, 
follows in order to obtain metric reconstruction, up to scale, 
from the projective one [Pollefeys et al, 1999]. Again, bundle 
adjustment is usually applied to the metric construction to 
optimize the solution. The next step, the creation of the 3D 
model, is more difficult to automate and is usually done 
interactively to define the topology and edit or post process the 
output. For large structures and scenes, since the technique may 
require a large number of images, the creation of the model 
requires a significant human interaction regardless of the fact 
that image registration and a large number of 3D points were 
computed fully automatically. 
The most impressive results remain to be those achieved with 
highly interactive approaches. Rather than full automation, an 
easy to use hybrid system known as Façade has been developed 
[Debevec et al, 1996]. The method’s main goal is the realistic 
creation of 3D models of architectures from small number of 
photographs. The basic geometric shape of the structure is first 
recovered interactively using models of polyhedral elements. In 
this step, the actual size of the elements and camera pose are 
captured assuming that the camera intrinsic parameters are 
known. The second step is an automated matching procedure, 
constrained by the now known basic model, to add geometric 
details. The approach proved to be effective in creating 
geometrically accurate and realistic models of architectures. 
The drawback is the high level of interaction and the 
restrictions to certain shapes. Also since assumed shapes 
determine all 3D points and camera poses, the results are as 
accurate as the assumption that the structure elements match 
those shapes. Our method, although similar in philosophy, 
replaces basic shapes with a small number of seed points to 
achieve more flexibility and higher level of details. In addition, 
the camera poses and 3D coordinates are determined without 
any assumption of the shapes but instead by a full bundle 
adjustment, with or without self-calibration depending on the 
given configuration. This achieves higher geometric accuracy 
independent from the shape of the object. 
The Façade approach has inspired several research activities to 
automate it. Werner and Zisserman, 2002, proposed a fully 
automated Façade-like approach. Instead of the basic shapes, 
the principal planes of the scene are created automatically to 
assemble a coarse model. Like Façade, the coarse model guides 
a more refined polyhedral model of details such as windows, 
doors, and wedge blocks. Since this is a fully automated 
approach, it requires feature detection and closely spaced 
images matching and camera pose estimation using projective 
geometry. Dick et al, 2001, proposed another automated 
Façade-like approach. It uses model-based recognition to 
extract high-level models in a single image then project them 
into other images for verification. The method requires 
parameterized building blocks with a priori distribution defined 
by the building style. The scene is modeled as base planes 
corresponding to walls or roofs; each may contain offset 3D 
shapes that model common architecture elements such as 
windows and columns. Again, the full automation necessitates 
feature detection and projective geometry approach. 
2.2 Range-Based Modeling 
As mentioned above, three-dimensional measurement from 
images requires that interest points or edges be visible in the 
image, which is not always possible. They are also affected by
1
2
...
60
61
62
63
64
...
164
165
Full text: Proceedings of the CIPA WG 6 International Workshop on Scanning for Cultural Heritage Recording

Access restriction

Copyright

Note to user