The role of models in automated scene analysis

torlegård, kennert
Zisserman - 5 
below a threshold that correspondence is deemed to support the fundamental ma 
trix. The process is repeated a set number of times, and the putative fundamental 
matrix with the most correspondences is the one finally adopted. 
The final fundamental matrix is computed using all the n correspondences sup 
porting it. These correspondences generate a set of n linear homogeneous equations 
in the nine elements of F, which can be solved using SVD. However, the linear ma 
trix solution will not necessarily have rank 2. So this solution is then used as the 
starting point for a non-linear optimisation (similar to bundle adjustment), over the 
seven parameters of F, which minimises the perpendicular distance of a point from 
its corresponding epipolar line, and enforces the rank condition. 
Computational algorithm The aim is to obtain a small number of reliable seed 
matches, then to compute the fundamental matrix, F, which is used to guide further 
matching. 
1. Unguided matching: Given a corner at position (x, y ) in the first image, the 
search for a match is centred on (x, y ) in the second image, and the strength of 
candidate matches is measured by cross-correlation. The threshold for match 
acceptance is deliberately conservative to minimise incorrect matches. 
2. Compute epipolar geometry: the seed matches are used to robustly com 
pute F using RANSAC, with the linear computation followed by a non-linear 
fit to those matches supporting F. 
3. Guided matching: Matching is then resumed, but using F to constrain search 
areas to a band about the epipolar lines. 
Typically the number of corners in a 512 x 512 image of an indoor scene is about 
300, the number of seed matches is about 100, and the final number of matches is 
about 200-250. Using corners computed to sub-pixel accuracy, the typical distance 
of a point from its epipolar line is ~0.2 pixels. 
2.2 Motion Segmentation/Clustering 
In the previous section it was assumed that the camera was moving through a static 
scene (or equivalently there is a single object moving relative to the camera). Often 
in motion sequences there are multiple moving objects, including the camera. 
It is often necessary (for example to initialise tracking) to detect any of the ob 
jects within the scene which change their relative dispositions and cluster (group) 
image features into sets belonging to each independently moving object. The fun 
damental matrix provides a principled way to achieve this. 
Suppose a rigid object moves relative to the camera, then the projections of 
points on the object are linked over two views by the fundamental matrix. This fun 
damental matrix depends on the internal camera parameters and 3D motion, but 
not the 3D structure of the object. Image points on another object moving indepen 
dently of the first will also be linked over the two views by a fundamental matrix, but
1
2
...
188
189
190
191
192
...
201
202
Full text: The role of models in automated scene analysis

Access restriction

Copyright

Note to user