Zisserman - 10
used to compute the pose of the model in 3-space, and the projection of the entire
model onto the image is generated. The support for this recognition hypothesis is
assessed by measuring the overlap of the projected features with image features
other than those of the original grouping. This approach is typical of many existing
systems [2, 8, 14, 15].
As the size of the model library increases, this approach becomes computation
ally too expensive, since if M models are in the library then recognition complexity
is linear in M, each model being tried in turn. It is then more effective to choose
potential models from the library based on the observed image features alone. That
is, image feature measurements are used to index into the model base. In construct
ing such index functions , invariance plays a major role, since a model should be
identified irrespective of object pose.
3.1 Geometric invariants for recognition
Invariants are properties of geometric configurations which remain unchanged under
an appropriate class of transformations. For example, properties such as intersection,
collinearity, and tangency are unaffected by a projective transformation; however,
invariant values can also be computed and these are of particular importance in
forming index functions. Five coplanar lines, for example, have 2 invariants under
planar projective transformations given by:
h = | M431 || N521 | and h = | N<21 11” 532 ' (4)
|N 4 21 1 |N 5 31 1 IN432I |N 5 21 1
where N,-^ = (1,-,lj,U), |N,j^| is the determinant, and 1 = (/ 1 , / 2 , ^ 3 ) is the homoge
neous representation of a line: l\X + l%y + /3 = 0. Numerous other invariants for
planar and 3D structures are given in [20].
3.2 Planar Object Recognition
The use of planar projective invariants for planar object recognition is particularly
appropriate and straightforward because a projective transformation between object
and image planes covers all the major imaging transformations: the plane to plane
projectivity models the composed effects of 3D rigid rotation and translation of the
world plane (exterior orientation), perspective projection to the image plane, and an
affine transformation of the final image which covers the effects of camera internal
parameters.
The key idea is that projective invariants of the object have the same value
when computed in any perspective image of the object. Recognition proceeds by
grouping image features into configurations, and computing the invariants of the
configurations. These invariants are used to index the model library. If an invariant
value corresponds to a value in the library then a recognition hypothesis is generated
for that object. The recognition hypothesis is verified by projecting the model
outline onto the image (the projection is based on correspondences between the