ISPRS Commission III, Vol.34, Part 3A „Photogrammetric Computer Vision“, Graz, 2002
mine its pose. This problem is especially grave for large models.
The required accuracy is usually not obtainable, even in low noise
images, because the discretization of the image leads to edge di-
rection errors that already are too large for the GHT.
In all approaches above, the edge image is binarized. This makes
the object recognition algorithm invariant only against a narrow
range of illumination changes. If the image contrast is lowered,
progressively fewer edge points will be segmented, which has
the same effects as progressively larger occlusion. The similarity
measures proposed in this paper overcome all of the above prob-
lems and result in an object recognition strategy robust against
occlusion, clutter, and nonlinear illumination changes. They can
be extended to be robust to global as well as local contrast rever-
sals.
2 SIMILARITY MEASURES
The model of an object consists of a set of points p; — (Ti, yi)
and associated direction vectors di — (ti, wi)”, à = t...,n
The direction vectors can be generated by a number of different
image processing operations, e.g., edge, line, or corner extraction,
as discussed in Section 3. Typically, the model is generated from
an image of the object, where an arbitrary region of interest (ROI)
specifies that part of the image in which the object is located. It is
advantageous to specify the coordinates p; relative to the center
of gravity of the ROI of the model or to the center of gravity of
the points of the model.
The image in which the model should be found can be trans-
formed into a representation in which a direction vector ex,y =
(Dog: Weg)” is obtained for each image point (x,y). In the
matching process, a transformed model must be compared to the
image at a particular location. In the most general case considered
here, the transformation is an arbitrary affine transformation. It is
useful to separate the translation part of the affine transformation
from the linear part. Therefore, a linearly transformed model is
given by the points p, — Api and the accordingly transformed
direction vectors d; — Adi, where
As discussed above, the similarity measure by which the trans-
formed model is compared to the image must be robust to occlu-
sions, clutter, and illumination changes. One such measure is to
sum the (unnormalized) dot product of the direction vectors of
the transformed model and the image over all points of the model
to compute a matching score at a particular point q = (x, y)" of
the image, i.e., the similarity measure of the transformed model
at the point q, which corresponds to the translation part of the
affine transformation, is computed as follows:
I~"
DC Q)
TL
} > tiv ; + U;W /
= — i + ; / .
n iYz+z,,y+y; i" m-c.0V.
i=1
If the model is generated by edge or line filtering, and the im-
age is preprocessed in the same manner, this similarity measure
fulfills the requirements of robustness to occlusion and clutter. If
parts of the object are missing in the image, there are no lines
or edges at the corresponding positions of the model in the im-
age, i.e., the direction vectors will have a small length and hence
contribute little to the sum. Likewise, if there are clutter lines or
edges in the image, there will either be no point in the model at
the clutter position or it will have a small length, which means it
will contribute little to the sum.
A - 346
The similarity measure (1) is not truly invariant against illumi-
nation changes, however, since usually the length of the direc-
tion vectors depends on the brightness of the image, e.g., if edge
detection is used to extract the direction vectors. However, if a
user specifies a threshold on the similarity measure to determine
whether the model is present in the image, a similarity measure
with a well defined range of values is desirable. The following
similarity measure achieves this goal:
1 — (di, Eqtn!)
Y ub o
i=1
dill - lleg+p
1
= 2%
/ /
Ui Us-Ea^ yp! T Use! yl
n 2 + ul? , [v3 w?
i=1 i + z+z},y+y, + zz}, y+y.
Because of the normalization of the direction vectors, this sim-
ilarity measure is additionally invariant to arbitrary illumination
changes since all vectors are scaled to a length of 1. What makes
this measure robust against occlusion and clutter is the fact that
if a feature is missing, either in the model or in the image, noise
will lead to random direction vectors, which, on average, will
contribute nothing to the sum.
The similarity measure (2) will return a high score if all the di-
rection vectors of the model and the image align, i.e., point in the
same direction. If edges are used to generate the model and im-
age vectors, this means that the model and image must have the
same contrast direction for each edge. Sometimes it is desirable
to be able to detect the object even if its contrast is reversed. This
is achieved by:
l| * Messen I
In rare circumstances, it might be necessary to ignore even lo-
cal contrast changes. In this case, the similarity measure can be
modified as follows:
LN dd eges?)
du WM ANLE s (4)
2 Ta Tero]
1 " (die + j)
s = > > Qm eT : (3)
i=1
The above three normalized similarity measures are robust to oc-
clusion in the sense that the object will be found if it is occluded.
As mentioned above, this results from the fact that the missing
object points in the instance of the model in the image will on av-
erage contribute nothing to the sum. For any particular instance
of the model in the image, this may not be true, e.g., because the
noise in the image is not uncorrelated. This leads to the unde-
sired fact that the instance of the model will be found in different
poses in different images, even if the model does not move in
the images, because in a particular image of the model the ran-
dom direction vectors will contribute slightly different amounts to
the sum, and hence the maximum of the similarity measure will
change randomly. To make the localization of the model more
precise, it is useful to set the contribution of direction vectors
caused by noise in the image to zero. The easiest way to do this
is to set all inverse lengths 1/||e,,..,; || of the direction vectors in
the image to 0 if their length ||eg+-p/ || is smaller than a threshold
that depends on the noise level in the image and the preprocess-
ing operation that is used to extract the direction vectors in the
image. This threshold can be specified easily by the user. By this
modification of the similarity measure, it can be ensured that an
occluded instance of the model will always be found in the same
pose if it does not move in the images.
The normalized similarity measures (2)-(4) have the property
that they return a number smaller than 1 as the score of a poten-
tial match. In all cases, a score of 1 indicates a perfect match be-
tween the model and the image. Furthermore, the score roughly