rapidly from a small number of images taken by tourists. The
paper concludes with a short discussion in section 7.
2. SYNOPSIS OF 3D RECONSTRUCTION
TECHNIQUES
The ultimate goal of all 3D reconstruction methods is to satisfy
the eight requirements listed in the previous section. Since this
is not an easy task, they focus on some of the tasks at the
expense of the others. We will use this to distinguish between
methods in order to depict a comparison. The methods may:
1- Focus on accuracy without any automation.
2- Focus on full automation.
3- Try to reach a balance between all requirements.
The most widely used method remains to be first method, which
is the traditional approach. This is a labor-intensive endeavor
where engineering plans or drawings plus surveying and/or
standard photogrammetry techniques are employed followed by
importing the measurements into a CAD system to create a 3D
model. The results are often unsatisfactory in appearance and
seem computer-generated. Efforts to increase the level of
automation became essential in order to meet the increasing
demand for 3D models. However, the efforts to completely
automate the process from taking images to the output of a 3D
model, while promising, are thus far not always successful. The
automation of camera pose estimation, self-calibration, and
computation of pixel 3D coordinates will be summarized. This
procedure, which is now widely used in computer vision [e.g.
Faugeras et al, 1998, Fitzgibbon et al, 1998, Pollefeys et al,
1999, Liebowitz, et al, 1999], starts with a sequence of images
taken by un-calibrated camera. The system automatically
extracts interest points, like corners, sequentially matches them
across views, then computes camera parameters and 3D
coordinates of the matched points using robust techniques. The
key to the success of this fully automatic procedure is that
successive images may not vary significantly, thus the images
must be taken at short intervals. The first two images are
usually used to initialize the sequence. It is important that the
points are tracked over a long sequence or in every image where
they appear to reduce the error propagation. This is all done in a
projective geometry basis and is usually followed by a bundle
adjustment, also in the projective space. Self-calibration to
compute the intrinsic camera parameters, usually only the focal
length, follows in order to obtain metric reconstruction, up to
scale, from the projective one [Pollefeys et al, 1999]. Again,
bundle adjustment is usually applied to the metric construction
to optimize the solution. The next step, the creation of the 3D
model, is more difficult to automate and is usually done
interactively to define the topology and edit or post process the
model. An output model based only on the measured points will
usually consist of surface boundaries that are irregular and
overlapping and need some assumption to be corrected using
for example planes and plane intersections. For large structures
and scenes, since the technique may require a large number of
images, the creation of the model requires a significant human
interaction regardless of the fact that image registration and a
large number of 3D points were computed fully automatically.
The degree of modeling automation increases when certain
assumptions about the object, such as architectures, can be
made. Since automated image-based methods rely on features
that can be extracted from the scene, occlusions and un-textured
surfaces are problematic. We often end up with areas with too
many features that are not all needed for modeling, and areas
without any or have minimum features that cannot produce a
complete model.
The most impressive results remain to be those achieved with
interactive approaches. Rather than full automation, a hybrid
easy to use system named Fagade has been developed [Debevec
et al, 1996]. The method’s main goal is the realistic creation of
3D models of architectures from small number of photographs.
The basic geometric shape of a structure is first recovered using
models of polyhedral elements. In this interactive step, the
actual size of the elements and camera pose are captured
assuming that the camera intrinsic parameters are known. The
second step is an automated matching procedure, constrained by
the now known basic model to add geometric details. The
approach proved to be effective in creating geometrically
accurate and realistic models. The drawback is the high level of
interaction and the restrictions to certain shapes. Also since
assumed shapes determine all 3D points and camera poses, the
results are as accurate as the assumption that the structure
elements match those shapes. Our method, although similar in
philosophy, replaces basic shapes with a small number of seed
points in multiple images to achieve more flexibility and levels
of detail. In addition, the camera poses and 3D coordinates are
determined without any assumption of the shapes but instead by
a full bundle adjustment, with or without self-calibration
depending on the given configuration. This achieves higher
geometric accuracy independent from the shape of the object.
The Fagade approach has inspired several research activities to
automate it. Werner and Zisserman, 2002, proposed a fully
automated Fagade-like approach. Instead of the basic shapes,
the principal planes of the scene are created automatically to
assemble a coarse model. These are three dominating directions
that are assumed to be perpendicular to each other. Like Façade,
the coarse model guides a more refined polyhedral model of
details such as windows, doors, and wedge blocks. Since this is
a fully automated approach, it requires feature detection and
closely spaced images for the automatic matching and camera
pose estimation using projective geometry. Dick et al, 2001,
proposed another automated Fagade-like approach. It employs
model-based recognition technique to extract high-level models
in a single image then use their projection into other images for
verification. The method requires parameterized building blocks
with a priori distribution defined by the building style. The
scene is modeled as a set of base planes corresponding to walls
or roofs, each of which may contain offset 3D shapes that
model common architecture elements such as windows and
columns. Again, the full automation necessitates feature
detection and projective geometry approach, however the
technique used here also employs planner constraints and
perpendicularity between planes to improve the matching
process. Another approach [Tao et al, 2001] to improve the
automatic matching and scene segmentation for modeling, after
image registration, applies depth smoothness constraints on
surfaces combined with color similarity constraints.
The presence of noise, which result from extracting features
from images, will make the choice of camera positions, or more
precisely motion versus object distance, critical for correct
construction. This has been studied widely in photogrammetry
[e.g. Fraser, 1994]. It has been lately recognized in computer
vision that photogrammetric bundle adjustment provides the
optimum solution to image-based modeling [Triggs et al, 2000].
This has resulted in the inclusion of bundle adjustment
following the sequential techniques as mentioned above.
Critical analyses of automated techniques that use projective
geometry were undertaken [Oliensis, 2000, Bougnoux, 1998].
Configurations that lead to ambiguous projective construction
|
ha
cal
cri
pai
Th
aut
est
Frc
poi
at "AR AS hen hend ju humd
It i
onl
tim
OCC
Oc
sin
con
aut
Ye
app
Sce
con
The
exa;
con
are
logi
is €
bets
the
[Tz