Zisserman - 5
below a threshold that correspondence is deemed to support the fundamental ma
trix. The process is repeated a set number of times, and the putative fundamental
matrix with the most correspondences is the one finally adopted.
The final fundamental matrix is computed using all the n correspondences sup
porting it. These correspondences generate a set of n linear homogeneous equations
in the nine elements of F, which can be solved using SVD. However, the linear ma
trix solution will not necessarily have rank 2. So this solution is then used as the
starting point for a non-linear optimisation (similar to bundle adjustment), over the
seven parameters of F, which minimises the perpendicular distance of a point from
its corresponding epipolar line, and enforces the rank condition.
Computational algorithm The aim is to obtain a small number of reliable seed
matches, then to compute the fundamental matrix, F, which is used to guide further
matching.
1. Unguided matching: Given a corner at position (x, y ) in the first image, the
search for a match is centred on (x, y ) in the second image, and the strength of
candidate matches is measured by cross-correlation. The threshold for match
acceptance is deliberately conservative to minimise incorrect matches.
2. Compute epipolar geometry: the seed matches are used to robustly com
pute F using RANSAC, with the linear computation followed by a non-linear
fit to those matches supporting F.
3. Guided matching: Matching is then resumed, but using F to constrain search
areas to a band about the epipolar lines.
Typically the number of corners in a 512 x 512 image of an indoor scene is about
300, the number of seed matches is about 100, and the final number of matches is
about 200-250. Using corners computed to sub-pixel accuracy, the typical distance
of a point from its epipolar line is ~0.2 pixels.
2.2 Motion Segmentation/Clustering
In the previous section it was assumed that the camera was moving through a static
scene (or equivalently there is a single object moving relative to the camera). Often
in motion sequences there are multiple moving objects, including the camera.
It is often necessary (for example to initialise tracking) to detect any of the ob
jects within the scene which change their relative dispositions and cluster (group)
image features into sets belonging to each independently moving object. The fun
damental matrix provides a principled way to achieve this.
Suppose a rigid object moves relative to the camera, then the projections of
points on the object are linked over two views by the fundamental matrix. This fun
damental matrix depends on the internal camera parameters and 3D motion, but
not the 3D structure of the object. Image points on another object moving indepen
dently of the first will also be linked over the two views by a fundamental matrix, but