Close-range imaging, long-range vision

  
  
  
Figure 3: The pose estimation of a new view uses inferred 
structure-to-image matches. 
third view can then be used to determine the pose of this 
view in the reference frame defined by the two first views. 
The initial reconstruction is then refined and extended. By 
sequentially applying the same procedure the structure and 
motion of the whole sequence can be computed. The pose 
estimation procedure is illustrated in Figure 3. These re- 
sults can be refined through a global least-squares mini- 
mization of all reprojection errors. Efficient bundle adjust- 
ment techniques (Triggs et al. 2000) have been developed 
for this. Then the ambiguity is restricted to metric through 
self-calibration (Pollefeys et al., 1999a). Finally, a second 
bundle adjustment is carried out that takes the camera cali- 
bration into account to obtain an optimal estimation of the 
metric structure and motion. 
If in some views all tracked feature are located on a plane, 
the approach explained above would fail. This problem 
can be detected and solved by using the approach proposed 
in (Pollefeys et al., 2002). A statistical information cri- 
terion is used to detect the images that only observe pla- 
nar features and for these views the pose of the camera is 
only computed after the intrinsic camera parameters have 
been obtained through self-calibration (assuming they are 
all kept constant). In this way problems of ambiguities are 
avoided. 
2.3 Dense surface estimation 
To obtain a more detailed model of the observed surface a 
dense matching technique is used. The structure and mo- 
tion obtained in the previous steps can be used to constrain 
the correspondence search. Since the calibration between 
successive image pairs was computed, the epipolar con- 
straint that restricts the correspondence search to a 1-D 
search range can be exploited. Image pairs are warped so 
that epipolar lines coincide with the image scan lines. For 
this purpose the rectification scheme proposed in (Polle- 
feys et al., 1999b) is used. This approach can deal with 
arbitrary relative camera motion which is not the case for 
standard homography-based approaches which fail when 
the epipole is contained in the image. The approach pro- 
posed in (Pollefeys et al., 1999b) also guarantees minimal 
image size. The correspondence search is then reduced to a 
matching of the image points along each image scan-line. 
This results in a dramatic increase of the computational 
efficiency of the algorithms by enabling several optimiza- 
tions in the computations. An example of a rectified stereo 
B 
  
Figure 4: Example of a rectified stereo pair. 
pair is given in Figure 4. Note that all corresponding points 
are located on the same horizontal scan-line in both im- 
ages. 
In addition to the epipolar geometry other constraints like 
preserving the order of neighboring pixels, bidirectional 
uniqueness of the match, and detection of occlusions can 
be exploited. These constraints are used to guide the corre- 
spondence towards the most probable scan-line match us- 
ing a dynamic programming scheme (Van Meerbergen et 
al., 2002). The matcher searches at each pixel in one image 
for maximum normalized cross correlation in the other im- 
age by shifting a small measurement window along the cor- 
responding scan line. The algorithm employs a pyramidal 
estimation scheme to reliably deal with very large dispar- 
ity ranges of over 5096 of image size. The disparity search 
range is limited based on the disparities that were observed 
for the features in the previous reconstruction stage. 
The pairwise disparity estimation allows to compute image 
to image correspondence between adjacent rectified im- 
age pairs and independent depth estimates for each cam- 
era viewpoint. An optimal joint estimate is achieved by 
fusing all independent estimates into a common 3D model 
using a Kalman filter. The fusion can be performed in an 
economical way through controlled correspondence link- 
ing and was discussed more in detail in (Koch et al., 1998). 
This approach combines the advantages of small baseline 
and wide baseline stereo. It can provide a very dense depth 
map by avoiding most occlusions. The depth resolution is 
increased through the combination of multiple viewpoints 
and large global baseline while the matching is simplified 
through the small local baselines. 
2.4 Building virtual models 
In the previous sections a dense structure and motion re- 
covery approach was explained. This yields all the nec- 
essary information to build photo-realistic virtual models. 
The 3D surface is approximated by a triangular mesh to 
reduce geometric complexity and to tailor the model to the 
requirements of computer graphics visualization systems. 
A simple approach consists of overlaying a 2D triangular 
mesh on top of one of the images and then build a corre- 
sponding 3D mesh by placing the vertices of the triangles 
in 3D space according to the values found in the corre- 
sponding depth map. The image itself is used as texture 
map. If no depth value is available or the confidence is 
too low the corresponding triangles are not reconstructed. 
The same happens when triangles are placed over discon- 
tinuities. This approach works well on dense depth maps 
obtained from multiple stereo pairs. 
—582— 
  
The te 
view | 
corres; 
imagir 
highli; 
To reci 
binem 
in a si 
integre 
resentz 
and Le 
3 RE 
In this 
possib: 
A first 
Sagala 
ument: 
sos. T 
seum i 
minute 
as lase 
ure 5 
lustrate 
map. / 
tained 
more s 
by filte 
deviati 
tation. 
mappe: 
A seco 
head w 
fountai 
The 3L 
In this « 
the 3D 
from v 
viewpo 
An imp 
interact 
eler) is 
Compa 
portant 
from th 
degree : 
of the i 
tained « 
on whi 
for plar 
(and m 
that our 
propert: 
with lig 
3D moc 
researcl
1
2
...
595
596
597
598
599
...
640
641
Full text: Close-range imaging, long-range vision

Access restriction

Copyright

Note to user