Figure 1: Transfer of a point x’ in the first image via a plane defined
by the line I” in the second image and the projection center of the
second camera O".
e Compute I which goes through x" and is perpendicular to
IV = Faux. From x = (of, 24, 1)T andl] = (i, 15,15)"
follows" — (14, —1f, —2{15 + 31)"
~ . . ^ ; jk
e The transfered point is z"^ — "IT".
A closer look at this reveals that if one restricts oneself to points on
the epipolar line, then only 13 = —x1l2 + al varies. a" 1 74^
and 217 17* are constant and have to be computed only once per
epipolar line.
4 IMAGE ORIENTATION
4.4 Dealing with Mismatches by RANSAC
When determining homologous points there are generic means to re-
duce mismatches (cf. Section 4.2). Though, there are usually far too
many for an efficient least squares solution, if the knowledge about
the calibration and the orientation of the cameras is weak. As our
problem is of the type that we only have relatively few parameters
and a high redundancy, the RANSAC (random sample consensus)
approach (Fischler and Bolles, 1981) is a good choice.
RANSAC is based on the idea to select more or less randomly min-
imum sets of observations. The correctness of a set is evaluated by
the number of other observations which confirm it. For example, a
circle can be described by the two coordinates of its center and its ra-
dius. It can therefore be determined from a minimum of three points.
We produce a large number of randomly selected sets of three points.
From those we select the very set, which determines the circle, from
which as many as possible so-called accepted points have a distance,
which is smaller than a given threshold. The latter threshold usually
is chosen depending on the standard deviation of the points. Cen-
ter and radius of the circle can then be improved by an adjustment
taking into account all accepted points.
The minimum number of point correspondences necessary to deter-
mine the fundamental matrix with RANSAC is seven. For this we
have to take into account |F| — 0 (cf. Section 2). For the basic solu-
tion for n > 8 homologous points the homogenous linear equation
system reads x''Fx/ — 0 ,ie.,
pq! , n / 15 1 7 n cdl 1
(riz, Tii, Vis ViTi » ViVi » Vio Ti » Vi »
(Fi, Fi2, Fis, F21, Faz, Fas, Fa, F32, F33)
= A;u = 0. Vi = 1. 7L
Jr
With the 9-vector u representing the elements of the fundamental
or essential matrix this can be written in the form Au = 0 with A
= (A;). The best estimation for u is the unit singular vector cor-
responding to the smallest singular value of the n x 9-matrix A,
determined by singular value decomposition (SVD). The SVD of
a matrix A = UDV" (U and V orthogonal matrices and diagonal
matrix D) provides an efficient solution for overdetermined equation
systems, e.g., of the type minimization of || Az|| with the condition
||z|| = 1. The solution for this problem consists in the column of
V corresponding to the smallest real element, i.e., singular value, of
D. The solution is unique up to an unknown factor, as the system is
homogenous.
For seven points the 7 x 9-matrix À has rank 7. The solution for
Au = O is a 2D space with the form oF; + (1 — «)F2. Using the
fact that F is of rank 2 leads to |aF1 + (1 —a)F2)| = 0. This results
into a cubic polynomial for which either one or three real solutions
exist.
For the estimation of the trifocal tensor we employ the Carlsson-
Weinshall duality (Carlsson, 1995, Weinshall et al. 1995). This
gives us a solution for a minimum of six point triplets. The basic
idea of the duality is to interchange the roles of points being viewed
by several cameras and the projection centers. By this means, the
above algorithm for seven points and two images can be employed
for three images and six points.
4.2 Reduction of the Search Space
By means of a hierarchical approach based on image pyramids we
significantly reduce the search space. Thereby not only the efficiency
but also the robustness is improved considerably.
Highly precise conjugate points are obtained from a least-squares
matching of points obtained from the sub-pixel Fôrstner operator
(Forstner and Giilch, 1987). To reduce the complexity of the match-
ing especially on the highest level of the pyramids where no reduc-
tion of the search space, e.g., by means of epipolar lines, is yet avail-
able, several measures are taken. First, before the actual least-square
matching we sort out many points and calculate a first approximation
by thresholding and maximizing, respectively, the correlation score
among image windows. What is more, we restrict ourselves in the
first image to only a few hundred of the globally strongest points and
some more points which are strongest regionally on an even-spaced
grid.
On the highest pyramid level we compute the fundamental matrices.
On the lower levels we narrow down the search space by taking into
account the epipolar lines obtained from fundamental matrices. For
image triplets we estimate on the second level the trifocal tensor. On
the preceding levels, the points found by matching on the epipolar
lines in the second image are used to predict their positions in the
third image (cf. Section 3). The points are then verified by least
square matching.
The actual linear solution is based on RANSAC (cf. Section 4.1). If
no calibration information is available, we optimize the solution non-
linearly by the Levenberg-Marquardt algorithm (Hartley and Zisser-
man, 2000). If we assume P^ — [/|0], then we optimize the twelve
parameters P" — [Ala] for the fundamental matrix and the 24 pa-
rameters P" = [A|a| and P" — (B|b] for the trifocal tensor. On
the other hand, if calibration information is available, we compute
the essential matrix E from F. E can be separated into a rotation ma-
trix R and a translation vector t via SVD. Together they make up the
projection matrix P" — [R|f] if we assume P" — [/|0J. Finally,
—194-
to :
anc
nol
san
pal
Yh
squ
ass
go!
the
Th
pa
(T]
CC
ter
is |
To
ho
es!
th:
ep
by
Oi
sic
tui