The role of models in automated scene analysis

torlegård, kennert
Zisserman - 2 
constraints on the geometry. Inevitably, a number of the ‘novel’ results appearing 
in the vision literature are a rediscovery of standard photogrammetry techniques 
and theorems under another name, particularly those of the 19th Century German 
photogrammetrists. 
This paper presents a number of examples where the uncalibrated approach has 
been successful. These are grouped into the two areas where the deepest inroads 
have been made: structure from motion, and object recognition. Other applications 
include: uncalibrated grasping [9, 13], path planning and navigation [3], and fixation 
tracking for an active head [21]. The paper is largely tutorial, omitting proofs and 
derivations in favour of mathematical results and applications. 
1.1 Mathematical Preliminaries 
It is assumed that the camera can be modelled as an idealised pin-hole, with all rays 
intersecting at the optical centre, and projection from 3-space to the image modelled 
as a linear map on homogeneous coordinates. This map is represented by a 3 x 4 
projection matrix, P, so that 
x = PX (1) 
where homogeneous coordinates are used, X = (X,Y,Z, 1) T , x = (z,?/,l) T . For 
homogeneous quantities, = indicates equality up to a non-zero scale factor (which 
varies with points). P has 11 degrees of freedom since only ratios of elements are 
significant. It is useful to partition P as P = (M| — Mt) where t is the centre of 
projection (since the centre projects as PX = 0). Provided the first 3x3 matrix, 
M, is not singular (i.e. the optical centre is not on the plane at infinity), P can 
always be partitioned in this way. If ( X,Y,Z ) are Euclidean coordinates, then P 
can be further decomposed as P = C(R| — Rt) where R and t are the rotation and 
translation of the camera in the Euclidean world coordinate system. C is a 3 x 3 
upper triangular matrix, called the camera calibration matrix. C provides the 
affine transformation between an image point and a ray in the Euclidean frame 
associated with the camera. A camera is uncalibrated if the matrix C is unknown. 
2 Structure and Motion 
We concentrate here on two views and consider “point” features, rather than line 
segments or curves. The two view situation applies equally to stereo (two views 
acquired simultaneously) or motion (two views acquired sequentially). In the stereo 
case the two views are acquired with different cameras, however in the case of mo 
tion the camera is often the same and may have fixed internal parameters (i.e. the 
interior orientation is unchanged between views). This constraint can be used to 
generate additional geometric information. Constraints of this type are discussed in 
section 2.3.
1
2
...
185
186
187
188
189
...
201
202
Full text: The role of models in automated scene analysis

Access restriction

Copyright

Note to user