Figure 3: The pose estimation of a new view uses inferred
structure-to-image matches.
third view can then be used to determine the pose of this
view in the reference frame defined by the two first views.
The initial reconstruction is then refined and extended. By
sequentially applying the same procedure the structure and
motion of the whole sequence can be computed. The pose
estimation procedure is illustrated in Figure 3. These re-
sults can be refined through a global least-squares mini-
mization of all reprojection errors. Efficient bundle adjust-
ment techniques (Triggs et al. 2000) have been developed
for this. Then the ambiguity is restricted to metric through
self-calibration (Pollefeys et al., 1999a). Finally, a second
bundle adjustment is carried out that takes the camera cali-
bration into account to obtain an optimal estimation of the
metric structure and motion.
If in some views all tracked feature are located on a plane,
the approach explained above would fail. This problem
can be detected and solved by using the approach proposed
in (Pollefeys et al., 2002). A statistical information cri-
terion is used to detect the images that only observe pla-
nar features and for these views the pose of the camera is
only computed after the intrinsic camera parameters have
been obtained through self-calibration (assuming they are
all kept constant). In this way problems of ambiguities are
avoided.
2.3 Dense surface estimation
To obtain a more detailed model of the observed surface a
dense matching technique is used. The structure and mo-
tion obtained in the previous steps can be used to constrain
the correspondence search. Since the calibration between
successive image pairs was computed, the epipolar con-
straint that restricts the correspondence search to a 1-D
search range can be exploited. Image pairs are warped so
that epipolar lines coincide with the image scan lines. For
this purpose the rectification scheme proposed in (Polle-
feys et al., 1999b) is used. This approach can deal with
arbitrary relative camera motion which is not the case for
standard homography-based approaches which fail when
the epipole is contained in the image. The approach pro-
posed in (Pollefeys et al., 1999b) also guarantees minimal
image size. The correspondence search is then reduced to a
matching of the image points along each image scan-line.
This results in a dramatic increase of the computational
efficiency of the algorithms by enabling several optimiza-
tions in the computations. An example of a rectified stereo
B
Figure 4: Example of a rectified stereo pair.
pair is given in Figure 4. Note that all corresponding points
are located on the same horizontal scan-line in both im-
ages.
In addition to the epipolar geometry other constraints like
preserving the order of neighboring pixels, bidirectional
uniqueness of the match, and detection of occlusions can
be exploited. These constraints are used to guide the corre-
spondence towards the most probable scan-line match us-
ing a dynamic programming scheme (Van Meerbergen et
al., 2002). The matcher searches at each pixel in one image
for maximum normalized cross correlation in the other im-
age by shifting a small measurement window along the cor-
responding scan line. The algorithm employs a pyramidal
estimation scheme to reliably deal with very large dispar-
ity ranges of over 5096 of image size. The disparity search
range is limited based on the disparities that were observed
for the features in the previous reconstruction stage.
The pairwise disparity estimation allows to compute image
to image correspondence between adjacent rectified im-
age pairs and independent depth estimates for each cam-
era viewpoint. An optimal joint estimate is achieved by
fusing all independent estimates into a common 3D model
using a Kalman filter. The fusion can be performed in an
economical way through controlled correspondence link-
ing and was discussed more in detail in (Koch et al., 1998).
This approach combines the advantages of small baseline
and wide baseline stereo. It can provide a very dense depth
map by avoiding most occlusions. The depth resolution is
increased through the combination of multiple viewpoints
and large global baseline while the matching is simplified
through the small local baselines.
2.4 Building virtual models
In the previous sections a dense structure and motion re-
covery approach was explained. This yields all the nec-
essary information to build photo-realistic virtual models.
The 3D surface is approximated by a triangular mesh to
reduce geometric complexity and to tailor the model to the
requirements of computer graphics visualization systems.
A simple approach consists of overlaying a 2D triangular
mesh on top of one of the images and then build a corre-
sponding 3D mesh by placing the vertices of the triangles
in 3D space according to the values found in the corre-
sponding depth map. The image itself is used as texture
map. If no depth value is available or the confidence is
too low the corresponding triangles are not reconstructed.
The same happens when triangles are placed over discon-
tinuities. This approach works well on dense depth maps
obtained from multiple stereo pairs.
—582—
The te
view |
corres;
imagir
highli;
To reci
binem
in a si
integre
resentz
and Le
3 RE
In this
possib:
A first
Sagala
ument:
sos. T
seum i
minute
as lase
ure 5
lustrate
map. /
tained
more s
by filte
deviati
tation.
mappe:
A seco
head w
fountai
The 3L
In this «
the 3D
from v
viewpo
An imp
interact
eler) is
Compa
portant
from th
degree :
of the i
tained «
on whi
for plar
(and m
that our
propert:
with lig
3D moc
researcl