Close-range imaging, long-range vision

  
  
first the reconstruction contains a projective skew (i.e. par- 
allel lines are not parallel, angles are not correct, distances 
are too long or too short, etc.). This is due to the absence 
of a priori calibration. Using a self-calibration algorithm 
(Pollefeys et al., 1999a) this distortion can be removed, 
yielding a reconstruction equivalent to the original up to 
a global scale factor. This uncalibrated approach to 3D 
reconstruction allows much more flexibility in the acquisi- 
tion process since the focal length and other intrinsic cam- 
era parameters do not have to be measured —calibrated— 
beforehand and are allowed to change during the acquisi- 
üon. 
The reconstruction obtained as described in the previous 
paragraph only contains a sparse set of 3D points. Al- 
though interpolation might be a solution, this yields mod- 
els with poor visual quality. Therefore, the next step con- 
sists in an attempt to match all image pixels of an image 
with pixels in neighboring images, so that these points too 
can be reconstructed. This task is greatly facilitated by the 
knowledge of all the camera parameters which we have 
obtained in the previous stage. Since a pixel in the im- 
age corresponds to a ray in space and the projection of this 
ray in other images can be predicted from the recovered 
pose and calibration, the search of a corresponding pixel in 
other images can be restricted to a single line. Additional 
constraints such as the assumption of a piecewise continu- 
ous 3D surface are also employed to further constrain the 
search. It is possible to warp the images so that the search 
range coincides with the horizontal scan-lines. An algo- 
rithm that can achieve this for arbitrary camera motion is 
described in (Pollefeys et al., 1999b). This allows us to 
use an efficient stereo algorithm that computes an optimal 
match for the whole scan-line at once (Van Meerbergen et 
al., 2002). Thus, we can obtain a depth estimate (i.e. the 
distance from the camera to the object surface) for almost 
every pixel of an image. By fusing the results of all the 
images together a complete dense 3D surface model is ob- 
tained. The images used for the reconstruction can also 
be used for texture mapping so that a final photo-realistic 
result is achieved. The different steps of the process are 
illustrated in Figure 1. In the following paragraphs the dif- 
ferent steps are described in some more detail. 
2.1 Relating images 
Starting from a collection of images or a video sequence 
the first step consists of relating the different images to 
each other. This is not an easy problem. A restricted num- 
ber of corresponding points is sufficient to determine the 
geometric relationship or multi-view constraints between 
the images. Since not all points are equally suited for 
matching or tracking (e.g. a pixel in a homogeneous re- 
gion), feature points need to be selected (Harris and Stephens, 
1988, Shi and Tomasi, 1994). Depending on the type of 
image data (i.e. video or still pictures) the feature points 
are tracked or matched and a number of potential corre- 
spondences are obtained. From these the multi-view con- 
straints can be computed. However, since the correspon- 
dence problem is an ill-posed problem, the set of corre- 
sponding points can (and almost certainly will) be con- 
taminated with an important number of wrong matches or 
  
6000 [rer -— —— 1 1 1 uq 
  
  
  
5500 crs + 
x ER 
5000 |- ee eee - 
ut. TT 89 987 
~~ 
= 
4500 - +. 4 
4 
/ 
; 
4000 |- / 1 
/ 
/ 
3500 - 
/ 
| 
| 
| 
| 
3000 —— 1 l 1 D 1 rn 
2 4 6 8 10 12 
Figure 2: Comparison of epipolar geometry (F-GRIC/solid 
line) and homography model (H-GRIC/dashed line) for 
tracked features. The lowest GRIC value yields the best 
model. 
outliers. A traditional least-squares approach will fail and 
therefore a robust method is used (Torr, 1995, Fischler and 
Bolles, 1981). Once the multi-view constraints have been 
obtained they can be used to guide the search for additional 
correspondences. These can then be employed to further 
refine the results for the multi-view constraints. 
In case of video computing the epipolar geometry between 
two consecutive views is not well determined. In fact as 
long as the camera has not sufficiently moved, the mo- 
tion of the features can just as well be explained by a ho- 
mography. The Geometric Robust Information Criterion 
(GRIC) proposed by Torr (Torr et al., 1999) allows to eval- 
uate which of the two models —epipolar geometry (F) or 
homography (H)- is best suited to explain the data. Typ- 
ically, for very small baselines the homography model is 
always selected, as the baseline gets larger both models 
become equivalent and eventually the epipolar geometry 
model outperforms the homography based one. This can 
be seen in Figure 2. One can reliably compute the epipolar 
geometry from the moment that the F-GRIC value drops 
below the H-GRIC value. We then select as a new key- 
frame the last frame for which the number of tracked fea- 
tures is above 90% of the number of features tracked at the 
F-GRIC/H-GRIC intersection. 
2.2 Structure and motion recovery 
The relation between the views and the correspondences 
between the features, retrieved as explained in the previ- 
ous section, will be used to retrieve the structure of the 
scene and the motion of the camera. Our approach is fully 
projective so that it does not depend on the initialization. 
This is achieved by strictly carrying out all measurements 
in the images, i.e. using reprojection errors instead of 3D 
errors. 
At first two images are selected and an initial projective re- 
construction frame is set-up (Faugeras, 1992, Hartley et al., 
1992). Matching feature points are reconstructed through 
triangulation. Features points that are also observed in a 
—580—
1
2
...
593
594
595
596
597
...
640
641
Full text: Close-range imaging, long-range vision

Access restriction

Copyright

Note to user