Photogrammetric computer vision: Papers accepted on the basis of peer-review full manuscripts

kalliany, r.; leberl, franz w.
c 
T 
— 
-— 2 NP Mv 
"— — IL 
  
  
  
ISPRS Commission III, Vol.34, Part 3A ,Photogrammetric Computer Vision“, Graz, 2002 
  
Highly precise conjugate points are obtained from a least-squares 
matching of points obtained from the sub-pixel Fôrstner operator 
(Fórstner and Gülch, 1987). On the highest level of the pyramids 
which usually consist of about 100 x 100 pixels, no reduction of 
the search space, e.g., by means of epipolar lines, is yet available. 
To reduce the complexity of the matching, several measures are 
taken. First, before the actual least-square matching we sort out 
many points and calculate a first approximation by thresholding and 
maximizing, respectively, the correlation score among image win- 
dows. What is more, we restrict ourselves in the first image to only 
a few hundred of the globally strongest points and some more points 
which are strongest regionally on a even-spaced grid. 
Section 4 made clear that the trifocal tensor is superior for point 
transfer compared to the intersection of epipolar lines. Yet, before 
one has an approximation of the trifocal tensor it is a good idea 
to compute fundamental matrices to narrow down the search space 
(cf. Section 5.2). We do this in two ways: On the highest pyramid 
level we compute the fundamental matrices and from them the epipo- 
lar lines from the first to the second and to the third image before we 
actually search for image triplets. After we have obtained an approx- 
imation of the trifocal tensor, we compute from it the fundamental 
matrix from the first to the second image on the lower levels. For the 
points found by matching on the epipolar line in the second image 
we predict their position in the third image. Figure 4 shows the first 
image with an extracted point on the left side. On the epipolar line 
in the second image two points were found. Those two lead in turn 
to the prediction of the two points in the third image for which one 
is obviously wrong and would not be matched. 
  
Figure 4: Prediction: From the point in a) the epipolar line in b) 
arises from the fundamental matrix. The two points found in b) 
uniquely determine the two points in c) via the trifocal tensor. 
The linear solution for the trifocal tensor is based on RANSAC 
(cf. Section 5.2). To improve the results, the parameters of the 
second and third cameras are finally optimized by the non-linear 
Levenberg-Marquardt algorithm (Hartley and Zisserman, 2000). 
The approach was implemented in C++ making use of the public do- 
main linear algebra package LAPACK interfaced by the template nu- 
merical toolkit (TNT; math.nist.gov/tnt). Results are shown in Fig- 
ures 5 and 6. Please note that we use by default a quadratic search 
space which covered in the case of both Figures 50% of the image 
area. 
6 DEPTH ESTIMATION 
Our depth estimation, which can deal with strong occlusions and 
large disparity ranges also for details, i.e., very tiny structures in the 
image, is built upon the approach proposed in (Zitnick and Kanade, 
2000). It employs epipolar resampled imagery. The basic idea is to 
calculate matching scores for a disparity range and store this infor- 
mation in a 3D array made up of image width and height as well as 
disparity range. This array is then filtered with a 3D box-filter to 
obtain the local support for a match from all close-by matches. On 
the other hand, it is assumed that on one ray of view only one point 
is visible. This implies an inhibition which is realized by weight- 
ing down all scores besides the strongest. Support and inhibition 
are iterated. Thereby, the information is propagated more globally. 
To avoid hallucination, the original matching scores are considered 
after each iteration. Finally, occlusion regions are marked by thresh- 
olding the matching scores. We have extended this approach with 
the following features: 
e Additionally to normalized cross-correlation we employ abso- 
lute differences as proposed by (Scharstein and Szeliski, 2002) 
for the matching scores. 
e We automatically estimate the disparity range from a number of 
sample lines. This works relatively robustly and considerably 
reduces the search space and therefore the computation time. 
e By separating the 3D box-filter into 2D planes and 1D sticks 
we have sped up the computation for this part by a factor of 5. 
e We determine the convergence of the algorithm automatically 
by calculating a difference image and setting a threshold on its 
mean and variance. 
e The smoothness of the output is improved by a sub-pixel dis- 
parity estimation in the original matching scores. 
7 VIEW SYNTHESIS WITH THE TRIFOCAL TENSOR 
We use the view synthesis scheme proposed in (Avidan and Shashua, 
1998). The basic idea is to use calibrated imagery together with a 
depth map. With the latter, points homologous to points in the first 
image are obtained for the second image. 
At least a weak calibration is necessary to make a navigation through 
the image meaningful for the user as only then rotation matrices and 
translation vectors are defined in a Euclidean sense. If calibration 
information is available, we compute E from it. It can be separated 
into a rotation matrix R and a translation vector # via SVD. Together 
they make up the projection matrix P" = ([R|f] if we assume P" — 
[/|0]. To improve the results, the three parameters of the rotation 
matrix and the two parameters ofthe translation vector are optimized 
by Levenberg-Marquardt. If no calibration information is available, 
we assume that both images are taken with the same camera without 
modifications. We further assume that the principal point equals the 
image center, that there is no sheer, i.e., s = 0, that the focal length 
is one and that the pixels are square. We provisionally calibrate the 
fundamental matrix with these assumptions and when optimizing by 
Levenberg-Marquardt we vary two additional parameters describing 
c as well as the relation of the width and the height of a pixel. 
The trifocal tensor is initially instantiated from the fundamental ma- 
trix 
jk ljk 
77? = €” Fu y 
1 
where €'?* is the cross-product tensor and Fy; is F in tensor notation. 
Then, the view synthesis is accomplished by modifying the trifocal 
tensor by rotation matrices R (R in tensor notation) and translation 
vectors ¢ given by the user. The modified tensor is 
gi* RET a 
A - 215
1
2
...
228
229
230
231
232
...
456
457
Full text: Papers accepted on the basis of peer-review full manuscripts (Part A)

Access restriction

Copyright

Note to user