Photogrammetric computer vision: Papers accepted on the basis of peer-review full manuscripts

kalliany, r.; leberl, franz w.
ISPRS Commission III, Vol.34, Part 3A ,,Photogrammetric Computer Vision“, Graz, 2002 
sponding image features between these images (i.e. im- 
age points that originate from the same 3D feature). In the 
case of video data features are tracked until disparities be- 
come sufficiently large so that an accurate estimation of the 
epipolar geometry becomes possible. 
The next step consists of recovering the motion and cali- 
bration of the camera and the 3D structure of the tracked 
or matched features. This process is done in two phases. At 
first the reconstruction contains a projective skew (i.e. par- 
allel lines are not parallel, angles are not correct, distances 
are too long or too short, etc.). This is due to the absence 
of a priori calibration. Using a self-calibration algorithm 
(Pollefeys et al., 1999a) this distortion can be removed, 
yielding a reconstruction equivalent to the original up to 
a global scale factor. This uncalibrated approach to 3D 
reconstruction allows much more flexibility in the acquisi- 
tion process since the focal length and other intrinsic cam- 
era parameters do not have to be measured —calibrated— 
beforehand and are allowed to change during the acquisi- 
tion. 
The reconstruction obtained as described in the previous 
paragraph only contains a sparse set of 3D points. Al- 
though interpolation might be a solution, this yields mod- 
els with poor visual quality. Therefore, the next step con- 
sists in an attempt to match all image pixels of an image 
with pixels in neighboring images, so that these points too 
can be reconstructed. This task is greatly facilitated by the 
knowledge of all the camera parameters which we have 
obtained in the previous stage. Since a pixel in the im- 
age corresponds to a ray in space and the projection of this 
ray in other images can be predicted from the recovered 
pose and calibration, the search of a corresponding pixel in 
other images can be restricted to a single line. Additional 
constraints such as the assumption of a piecewise continu- 
ous 3D surface are also employed to further constrain the 
search. It is possible to warp the images so that the search 
range coincides with the horizontal scan-lines. An algo- 
rithm that can achieve this for arbitrary camera motion is 
described in (Pollefeys et al., 1999b). This allows us to 
use an efficient stereo algorithm that computes an optimal 
match for the whole scan-line at once (Van Meerbergen et 
al., 2002). Thus, we can obtain a depth estimate (i.e. the 
distance from the camera to the object surface) for almost 
every pixel of an image. By fusing the results of all the 
images together a complete dense 3D surface model is ob- 
tained. The images used for the reconstruction can also 
be used for texture mapping so that a final photo-realistic 
result is achieved. The different steps of the process are 
illustrated in Figure 1. In the following paragraphs the dif- 
ferent steps are described in some more detail. 
2.1 Relating images 
Starting from a collection of images or a video sequence 
the first step consists of relating the different images to 
each other. This is not an easy problem. A restricted num- 
ber of corresponding points is sufficient to determine the 
geometric relationship or multi-view constraints between 
the images. Since not all points are equally suited for 
     
Relating images 
  
  
Y 
  
Structure & Motion 
recovery 
  
  
  
Dense Matching 
  
  
  
3D Model Building 
Figure 1: Overview of our image-based 3D recording ap- 
proach. 
  
  
  
3D surface model 
matching or tracking (e.g. a pixel in a homogeneous re- 
gion), feature points need to be selected (Harris and Stephens, 
1988, Shi and Tomasi, 1994). Depending on the type of 
image data (i.e. video or still pictures) the feature points 
are tracked or matched and a number of potential corre- 
spondences are obtained. From these the multi-view con- 
straints can be computed. However, since the correspon- 
dence problem is an ill-posed problem, the set of corre- 
sponding points can (and almost certainly will) be con- 
taminated with an important number of wrong matches or 
outliers. A traditional least-squares approach will fail and 
therefore a robust method is used (Torr, 1995, Fischler and 
Bolles, 1981). Once the multi-view constraints have been 
obtained they can be used to guide the search for additional 
correspondences. These can then be employed to further 
refine the results for the multi-view constraints. 
In case of video computing the epipolar geometry between 
two consecutive views is not well determined. In fact as 
long as the camera has not sufficiently moved, the mo- 
tion of the features can just as well be explained by a ho- 
mography. The Geometric Robust Information Criterion 
(GRIC) proposed by Torr (Torr et al., 1999) allows to eval- 
uate which of the two models —epipolar geometry (F) or 
homography (H)- is best suited to explain the data. Typ- 
ically, for very small baselines the homography model is 
always selected, as the baseline gets larger both models 
become equivalent and eventually the epipolar geometry 
model outperforms the homography based one. One can 
reliably compute the epipolar geometry from the moment 
A - 253
1
2
...
266
267
268
269
270
...
456
457
Full text: Papers accepted on the basis of peer-review full manuscripts (Part A)

Access restriction

Copyright

Note to user