nbul 2004
kes it dif-
> chained
d White-
y reliable
ne would
e Oui.
/ O(m?),
ave sign-
show ex-
of n, the
plexity is
vical pho-
there are
r gach of
compute
teen min-
y product
) perform
ut is a set
rocessing
a camera
red in the
ehind the
tion. AS
; y. e des
1g one as
int in 3D
ction ma-
joint onto
, then the
1 particu-
s, etc.) is
int in the
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part BS. Istanbul 2004
where C is the calibration matrix. Using raw pixel co-
ordinates, as opposed to actual 2D co-ordinates means that
we are dealing with an uncalibrated camera.
Consider the space point, X — [X, Y, Z]" , and its image
in two different camera locations; x — [r, y, 1]^ and x' —
[c^, y’, 1]7. Then it is well known that:
x'Ex' —0
Here E is the essential matrix, which is defined as:
Ez IR
where £ is the translational motion between the 3D cam-
era positions, and R is the rotation matrix. The essential
matrix can be computed from a set of correspondences be-
tween two different camera positions using linear methods
(Longuet-Higgins, 1981). This computational process has
been considered to be very ill-conditioned, but in fact a
simple pre-processing step improves the conditioning, and
produces very reasonable results (Hartley, 1997a). The
matrix FE encodes the epipolar geometry between the two
camera positions. If the calibration matrix C is not known,
then the uncalibrated version of the essential matrix is the
fundamental matrix:
n° fi =0
Here ü — [u. v, 1] and à' — [u', u/, 1]7 are the raw pixel
co-ordinates of the calibrated points x and x'. The fun-
damental matrix F can be computed directly from a set
of correspondences by a modified version of the algorithm
used to compute the essential matrix E. As is the case
with the essential matrix, the fundamental matrix also en-
codes the epipolar geometry of the two images. Once E is
known, the 3D location of the corresponding points can be
computed. Similarly, once F is known the 3D co-ordinates
of the corresponding points can also be computed, but up
to a prpjective transformation. However, the actual sup-
porting correspondences in terms of pixel co-ordinates are
identical for both the essential and fundamental matrices.
Having a camera calibration simply enables us to move
from a projective space into a Euclidean space, that is from
F to E.
5 FUNDAMENTAL MATRICES
We are given a set of n input images, and we want to cal-
culate the fundamental matrix between every one of these
n? image pairs. Consider a single pair of images from
these set of n? images. We first find corners in each im-
age, then find possible matches between these corners, and
finally use these putative matches to compute the funda-
mental matrix. The number of matching corners that sup-
port a given fundamental matrix is a good indication of the
quality or correctness of that matrix.
5.1 Corners/interest Points
The first step is to find a set of corners or interest points
in each image. These are the points where there is a sig-
nificant change in image gradient in both the x and y di-
rection. The most common corner algorithm is describe
by Harris (Harris and Stephens, 1988). While these cor-
ners are invariant to rotation in the camera plane they are
sensitive to changes in scale, and also to rotations out of
the camera plane. Recently, experiments have been done
to test a newer generation of corners that are invariant over
a wider set of transformations (Mikolajezk and Schmid,
2003). The results of this test show that SIFT corners per-
form best (Lowe, 1999). The SIFT operator finds at mul-
tiple scales, and describes the region around each corner
by a histogram of gradient orientations. This description
provides robustness against localization errors and small
geometric distortions.
5.2 Matching Corners
Each SIFT corner is characterized by 128 unsigned eight
bit numbers which define the multi-scale gradient orienta-
tion histogram. To match SIFT corners it is necessary to
compare corner descriptors. This is done by simply com-
puting the L? distance between two different descriptors.
Assume that in onc image there are j corners, and in the
other there are k corners. Then the goal is for each of these
j corners to find the closest of the k corners in the other
image under the L? norm. This takes time proportional to
jk, but since 7 and 7 are in the order of one thousand, the
time taken is not prohibitive.
However, it is still necessary to threshold these L? dis-
tances in order to decide if a match is acceptable. Instead
of using a fixed threshold for this L? distance, a dynamic
threshold is computed. This is done by finding the first and
second closest corner match. Then we compute the ratio
of these two L? distances, and accept the match only if
the second best match is significantly worse than the first.
If this is not the case then the match is considered to be
ambiguous, and is rejected. This approach works well in
practice (Lowe, 1999), and avoids the use of an arbitrary
threshold to decide on whether a pair of corners is a good
match.
5.3 Computing the Fundamental Matrix
The next step is to use these potentially matching corners to
compute the fundamental matrix. This process must be ro-
bust, since it can not be assumed that all of the matches are
correct. Robustness is achieved by using concepts from the
field of robust statistics, in particular, random sampling.
Random sampling is a “generate and test process” in which
a minimal set of correspondences, in this case the smallest
number necessary to define a unique fundamental matrix
(7 points), are randomly chosen (Rousseeuw, 1984, Bolles
and Fischler, 1981, Roth and Levine, 1993, Torr and Mur-
ray, 1993, Xu and Zhang, 1996). A fundamental matrix is
then computed from this best minimal set. The set of all
corners that satisfy this fundamental matrix is called the
support set. The fundamental matrix 7;;, with the largest
support set S F;; is returned by the random sampling pro-
cess. The matching corners (support set) for two typical
wide baseline views is shown in Figure 1.