c
T
—
-— 2 NP Mv
"— — IL
ISPRS Commission III, Vol.34, Part 3A ,Photogrammetric Computer Vision“, Graz, 2002
Highly precise conjugate points are obtained from a least-squares
matching of points obtained from the sub-pixel Fôrstner operator
(Fórstner and Gülch, 1987). On the highest level of the pyramids
which usually consist of about 100 x 100 pixels, no reduction of
the search space, e.g., by means of epipolar lines, is yet available.
To reduce the complexity of the matching, several measures are
taken. First, before the actual least-square matching we sort out
many points and calculate a first approximation by thresholding and
maximizing, respectively, the correlation score among image win-
dows. What is more, we restrict ourselves in the first image to only
a few hundred of the globally strongest points and some more points
which are strongest regionally on a even-spaced grid.
Section 4 made clear that the trifocal tensor is superior for point
transfer compared to the intersection of epipolar lines. Yet, before
one has an approximation of the trifocal tensor it is a good idea
to compute fundamental matrices to narrow down the search space
(cf. Section 5.2). We do this in two ways: On the highest pyramid
level we compute the fundamental matrices and from them the epipo-
lar lines from the first to the second and to the third image before we
actually search for image triplets. After we have obtained an approx-
imation of the trifocal tensor, we compute from it the fundamental
matrix from the first to the second image on the lower levels. For the
points found by matching on the epipolar line in the second image
we predict their position in the third image. Figure 4 shows the first
image with an extracted point on the left side. On the epipolar line
in the second image two points were found. Those two lead in turn
to the prediction of the two points in the third image for which one
is obviously wrong and would not be matched.
Figure 4: Prediction: From the point in a) the epipolar line in b)
arises from the fundamental matrix. The two points found in b)
uniquely determine the two points in c) via the trifocal tensor.
The linear solution for the trifocal tensor is based on RANSAC
(cf. Section 5.2). To improve the results, the parameters of the
second and third cameras are finally optimized by the non-linear
Levenberg-Marquardt algorithm (Hartley and Zisserman, 2000).
The approach was implemented in C++ making use of the public do-
main linear algebra package LAPACK interfaced by the template nu-
merical toolkit (TNT; math.nist.gov/tnt). Results are shown in Fig-
ures 5 and 6. Please note that we use by default a quadratic search
space which covered in the case of both Figures 50% of the image
area.
6 DEPTH ESTIMATION
Our depth estimation, which can deal with strong occlusions and
large disparity ranges also for details, i.e., very tiny structures in the
image, is built upon the approach proposed in (Zitnick and Kanade,
2000). It employs epipolar resampled imagery. The basic idea is to
calculate matching scores for a disparity range and store this infor-
mation in a 3D array made up of image width and height as well as
disparity range. This array is then filtered with a 3D box-filter to
obtain the local support for a match from all close-by matches. On
the other hand, it is assumed that on one ray of view only one point
is visible. This implies an inhibition which is realized by weight-
ing down all scores besides the strongest. Support and inhibition
are iterated. Thereby, the information is propagated more globally.
To avoid hallucination, the original matching scores are considered
after each iteration. Finally, occlusion regions are marked by thresh-
olding the matching scores. We have extended this approach with
the following features:
e Additionally to normalized cross-correlation we employ abso-
lute differences as proposed by (Scharstein and Szeliski, 2002)
for the matching scores.
e We automatically estimate the disparity range from a number of
sample lines. This works relatively robustly and considerably
reduces the search space and therefore the computation time.
e By separating the 3D box-filter into 2D planes and 1D sticks
we have sped up the computation for this part by a factor of 5.
e We determine the convergence of the algorithm automatically
by calculating a difference image and setting a threshold on its
mean and variance.
e The smoothness of the output is improved by a sub-pixel dis-
parity estimation in the original matching scores.
7 VIEW SYNTHESIS WITH THE TRIFOCAL TENSOR
We use the view synthesis scheme proposed in (Avidan and Shashua,
1998). The basic idea is to use calibrated imagery together with a
depth map. With the latter, points homologous to points in the first
image are obtained for the second image.
At least a weak calibration is necessary to make a navigation through
the image meaningful for the user as only then rotation matrices and
translation vectors are defined in a Euclidean sense. If calibration
information is available, we compute E from it. It can be separated
into a rotation matrix R and a translation vector # via SVD. Together
they make up the projection matrix P" = ([R|f] if we assume P" —
[/|0]. To improve the results, the three parameters of the rotation
matrix and the two parameters ofthe translation vector are optimized
by Levenberg-Marquardt. If no calibration information is available,
we assume that both images are taken with the same camera without
modifications. We further assume that the principal point equals the
image center, that there is no sheer, i.e., s = 0, that the focal length
is one and that the pixels are square. We provisionally calibrate the
fundamental matrix with these assumptions and when optimizing by
Levenberg-Marquardt we vary two additional parameters describing
c as well as the relation of the width and the height of a pixel.
The trifocal tensor is initially instantiated from the fundamental ma-
trix
jk ljk
77? = €” Fu y
1
where €'?* is the cross-product tensor and Fy; is F in tensor notation.
Then, the view synthesis is accomplished by modifying the trifocal
tensor by rotation matrices R (R in tensor notation) and translation
vectors ¢ given by the user. The modified tensor is
gi* RET a
A - 215