Full text: Proceedings of the CIPA WG 6 International Workshop on Scanning for Cultural Heritage Recording

4.1 Image-Based Modeling 
The approach is designed mainly for man-made objects such as 
classical architectures, which are designed within constraints of 
proportion and configurations. Classical buildings are divided 
into architectural elements. These elements are logically 
organized in space to produce a coherent work. There is a 
logical hierarchical relation among building parts and between 
parts and whole. The most common scheme divides the 
building into two sets of lines forming a rectangular grid 
[Tzonis and Lefaivre, 1986]. The distance between the grid 
lines are often equal or when they vary, they alter regularly. The 
grid lines are then turned into planes that partition the space and 
control the placement of the architectural elements. The 
automation of 3D reconstruction is better achieved when such 
understanding is taken into account. We will reconstruct the 
architecture elements from minimum number of points and put 
them together using the planes of a regular grid. Other schemes, 
such as a polar grid, also exist but the basic idea can be applied 
there too. Classical architecture can be reconstructed, knowing 
its components, even if only a fragment survives or seen in the 
images. For example, a columnar element consists of: 1) The 
capital, a horizontal member on top, 2) the column itself, a long 
vertical tapered cylinder, 3) a pedestal or a base on which the 
column rests. Each of those can be further divided into smaller 
elements. In addition to columns, other elements include pillars, 
pilasters, banisters, windows, doors, arches, and niches. Each 
can be reconstructed with a few seed points from which the rest 
of the element is built. 
Our approach is photogrammetry-based. The approach does not 
aim to be fully automated nor completely rely on human 
operator. It provides enough level of automation to assist the 
operator without sacrificing accuracy or level of details. Figure 
2 summarizes the procedure and indicates which step is 
interactive and which is automatic (interactive operations are 
light gray). The figure also shows an option of taking a closely- 
spaces sequence of images, if conditions allow, and increase the 
level of automation. Here, we will discuss only the option of 
widely separated views. Images are taken, all with the same 
camera set up, from positions where the object is suitably 
showing. There should be a reasonable distance, or baseline, 
between the images. Several features appearing in multiple 
images are interactively extracted, usually 12-15 per image. The 
user points to a comer and labels it with a unique number and 
the system will accurately extract the comer point. Harris 
operator is used [Harris, 1998] for its simplicity and efficiency. 
Image registration and 3D coordinate computation are based on 
photogrammetric bundle adjustment for its accuracy, flexibility, 
and effectiveness compared to other structure from motion 
techniques [Triggs et al, 2000]. Advances in bundle adjustment 
eliminated the need for control points or physically entering 
initial approximate coordinates. Many other aspects required for 
high accuracy such as camera calibration with full distortion 
corrections have long been solved problems in Photogrammetry 
and will not be discussed in this paper. 
We now have all camera coordinates and orientations and the 
3D coordinates of a set of initial points, all registered in the 
same global coordinates system. The next interactive operation 
is to divide the scene into connected segments to define the 
surface topology. This is followed by an automatic comer 
extractor, again the Harris operator, and matching procedure 
across the images to add more points into each of the segmented 
regions. The matching is constrained, within a segment, by the 
epipolar condition and disparity range setup from the 3D 
coordinates of the initial points. The bundle adjustment is 
repeated with the newly added points to improve on previous 
results and re-compute 3D coordinate of all points. 
Seed Points 
Element Properties 
Figure 2. General procedure for image-based modeling 
An approach to obtain 3D coordinates from a single image is 
essential to cope with occlusions and lack of features. Several 
approaches are available [e.g. van den Heuvel, 1998, Liebowitz 
et al, 1999]. Our approach uses several types of constraints for 
surface shapes such as planes and quadrics, and surface 
relations such as perpendicularity and symmetry. The equations 
of some of the planes can be determined from seed points 
previously measured. The equations of the remaining plane are 
determined using the knowledge that they are either 
perpendicular or parallel to the planes already determined. With 
little effort, the equations of the main planes on the structure, 
particularly those to which structural elements are attached, can 
be computed. 
1 Extract, match, and compute 
3D coordinates of seed points 
2. In 3D space reconstruct the 
object from the seed points 
Ær Column 
x x 
3. Project new points into 
the images 
4 Model and texture map the object 
Figure 3. Main steps of constructing architectural elements 
semi-automatically (column and window examples) 
From these equations and the known camera parameters for 
each image, we can determine 3D coordinates of any point or 
pixel from a single image even if there was no marking on the 
surface. When some plane boundaries are not visible, they can 
be computed by plane intersections. This can also be applied to 
surfaces like quadrics or cylinders whose equations can be 
computed from existing points. Other constraints, such as 
symmetry and points with the same depth or same height are 
also used. The general rule for adding points on structural

