fundamental
= 0 with A
- vector cor-
9-matrix A,
The SVD of
and diagonal
ned equation
he condition
1e column of
ilar value, of
the system is
solution for
‘2. Using the
. This results
'eal solutions
he Carlsson-
1995). This
s. The basic
being viewed
is means, the
be employed
pyramids we
the efficiency
least-squares
stner operator
of the match-
iere no reduc-
s, is yet avail-
al least-square
ipproximation
relation score
irselves in the
'est points and
n even-spaced
ental matrices.
by taking into
| matrices. For
cal tensor. On
n the epipolar
ositions in the
rified by least
Section 4.1). If
e solution non-
ley and Zisser-
1ize the twelve
and the 24 pa-
cal tensor. On
e, We compute
a rotation ma-
ey make up the
[0]. Finally,
to improve the results, the three parameters of the rotation matrix
and the two parameters of the translation vector are optimized by the
non-linear Levenberg-Marquardt algorithm. If no calibration infor-
mation is available, we assume that both images are taken with the
same camera without modifications. We further assume that princi-
pal point equals the image center, that there is no sheer, i.e., zp, = 0,
yn — 0and s = 0, the focal length c = 1 and the pixels are
square. We provisionally calibrate the fundamental matrix with these
assumptions and when optimizing by the Levenberg-Marquardt al-
gorithm we vary two additional parameters describing c as well as
the relation of the width and the height of a pixel.
The approach was implemented in C++ using the linear algebra
package LAPACK interfaced by the template numerical toolkit
(TNT; math.nist.gov/tnt). For image processing we employ HAL-
CON (www.mvtec.com). A result for the estimation of the trifocal
tensor with a quadratic search space covering 50% of the image area
is presented in Figure 2.
5 DISPARITY ESTIMATION
To utilize the point transfer described in Section 3, the coordinates of
homologous points need to be known. We generate them by disparity
estimation. Opposed to optical flow estimation we take into account
that points can only be homologous if they are on the corresponding
epipolar line in the other image. This constraint is taken into account
by epipolar resampling.
Our approach for disparity estimation can deal with strong occlu-
sions and large disparity ranges also for details, i.e., very tiny struc-
tures in the image. It is built upon (Zitnick and Kanade, 2000).
The main idea is a cooperation between support and inhibition
(cf. Fig. 3). The support region is a 2D- or usually 3D-box. All
matching scores in this box corroborate to generate a disparity map
which is locally continuous. When employing a 3D-box also sloped
regions are implicitly modeled.
Support Inhibition
disparity
—_— ———
width width
Figure 3: Support and inhibition
Inhibition enforces the uniqueness of a match. Assuming opaque
and non-reflecting surfaces, a ray of view emanating from a cam-
era will hit the scene only at one point. The idea is to gradually
weight down all matches on a ray of view besides the strongest. For
a stereo pair there are two rays (cf. Fig. 3). We store the matching
scores in a 3D array where for every pixel in the left/reference im-
age the matching score is stored as third dimension (cf. also Fig. 4).
Therefore, for the left image the ray of view is just a column in the
2D-slice of width and disparity. Because we work in disparity and
not in depth space, the ray of view of the right image consists of the
45? Jeft-slanted diagonal going through the pixel of interest. Putting
everything together, the support Sy, for a pixel at row r and column
c with disparity d is defined as
TTT EN EZ
/
Generation
staff sheet box
| (mete | | Hor
height
| | Update
pue bv 8 (11 à
fi EF » LB p
S
Et s cd
Edd subtract f add £7
Figure 4: Separation of 3 x 3 x 3 box: Generation and update
Ls(r 4- r',ct cdd),
Sn(r,c,d) = y
(r',c',d')ee
with L,, the score for the preceding iteration and ® the support re-
gion (usually a box). The new score for iteration n + 1 is obtained
as
œ
Sy, (ried)
,
Sn (r^. en d")
(r^ c^ ,d")ev
Lt (7, C, d) = Lo(r, C, d) 3
with V the union of the left and right inhibition region and o an
exponent controling the speed of convergence. o has to be chosen
greater than 1 to make the scores converge to 1. The multiplication
with the original matching scores Lo avoids hallucination in weak
matching areas. Finally, for each pixel of the left image the disparity
is chosen which has the maximum score. Practically, it is important
to correct the inhibition value for the fact that on the left and the right
side of the image a number of pixels depending on search width
and offset are not matched and therefore do not contribute to the
inhibition.
A critical parameter is the disparity range as it defines the search
space in the image. We automatically estimate it from the mean
and the standard deviation of the disparities for a number of sample
lines. This works relatively robustly and considerably reduces the
computation time. For the matching scores we employ (normalized)
cross-correlation optimized by separating the operations for mean
and variance computation.
Filtering with a 3D box-filter based on simple summation is highly
redundant. Even though this is true also for 2D, we restrict the pre-
sentation to 3D. To get rid of this redundancy, we separate the filter
into one-dimensional (1D) staffs and 2D sheets. By adding pixels
on top of each other we generate staffs (cf. Fig. 4). From them we
build sheets and finally from the sheets the box. The important part
is updating. To filter with a translated box, instead of adding sheets
we add a (new) sheet on one side and subtract the (old) sheet on the
other side. The same can be done for the sheets and the staffs. By
this means the complexity becomes independent of the size of the
box.
We have also automated the setting of one important parameter,
namely the number of iterations, by mapping it to the problem
of convergence determination. For the latter also parameters are
needed, but our empirical investigations show that they are relatively
independent from the images at hand. To determine the convergence,
we compute the difference image of the disparity maps from the two
—195—