Zisserman - 2
constraints on the geometry. Inevitably, a number of the ‘novel’ results appearing
in the vision literature are a rediscovery of standard photogrammetry techniques
and theorems under another name, particularly those of the 19th Century German
photogrammetrists.
This paper presents a number of examples where the uncalibrated approach has
been successful. These are grouped into the two areas where the deepest inroads
have been made: structure from motion, and object recognition. Other applications
include: uncalibrated grasping [9, 13], path planning and navigation [3], and fixation
tracking for an active head [21]. The paper is largely tutorial, omitting proofs and
derivations in favour of mathematical results and applications.
1.1 Mathematical Preliminaries
It is assumed that the camera can be modelled as an idealised pin-hole, with all rays
intersecting at the optical centre, and projection from 3-space to the image modelled
as a linear map on homogeneous coordinates. This map is represented by a 3 x 4
projection matrix, P, so that
x = PX (1)
where homogeneous coordinates are used, X = (X,Y,Z, 1) T , x = (z,?/,l) T . For
homogeneous quantities, = indicates equality up to a non-zero scale factor (which
varies with points). P has 11 degrees of freedom since only ratios of elements are
significant. It is useful to partition P as P = (M| — Mt) where t is the centre of
projection (since the centre projects as PX = 0). Provided the first 3x3 matrix,
M, is not singular (i.e. the optical centre is not on the plane at infinity), P can
always be partitioned in this way. If ( X,Y,Z ) are Euclidean coordinates, then P
can be further decomposed as P = C(R| — Rt) where R and t are the rotation and
translation of the camera in the Euclidean world coordinate system. C is a 3 x 3
upper triangular matrix, called the camera calibration matrix. C provides the
affine transformation between an image point and a ray in the Euclidean frame
associated with the camera. A camera is uncalibrated if the matrix C is unknown.
2 Structure and Motion
We concentrate here on two views and consider “point” features, rather than line
segments or curves. The two view situation applies equally to stereo (two views
acquired simultaneously) or motion (two views acquired sequentially). In the stereo
case the two views are acquired with different cameras, however in the case of mo
tion the camera is often the same and may have fixed internal parameters (i.e. the
interior orientation is unchanged between views). This constraint can be used to
generate additional geometric information. Constraints of this type are discussed in
section 2.3.