ISPRS Commission III, Vol.34, Part 3A ,Photogrammetric Computer Vision“, Graz, 2002
model M is generated from an image of the object to be recog-
nized. A region of interest (ROT) R specifies the object's location
in the image. The image part defined by R is called reference im-
age I". The image, in which the object should be recognized, will
be referred to as the search image I^. Almost all object recogni-
tion approaches can be split into two successive phases: the of-
fline phase including the generation of the model and the online
phase, in which the constructed model is used to find the object
in the search image.
The transformation class 7T , e.g., translations or euclidean, sim-
ilarity, affine, or arbitrary projective transformations, specifies
the degrees of freedom of the object, i.e., which transformations
the object may undergo in the search image. For all similarity
measures the object recognition step is performed by transform-
ing the model to a user-limited range of discrete transformations
T; € 7T within the transformation class. For each transformed
model M? — T;M the similarity measure is calculated between
M* and the corresponding representation of the search image.
The representation can, for example, be described by the raw gray
values in both images (e.g., when using the normalized cross cor-
relation) or by the corresponding binarized edges (e.g., when us-
ing the Hausdorff distance). The maximum or minimum of the
match metric then indicates the pose of the recognized object.
The first method to be analyzed is the Normalized Cross Corre-
lation (Brown, 1992) because it is a rather widely spread method
in industry and therefore well known in the application area of
object recognition. The Hausdorff Distance (Rucklidge, 1997) is
the second candidate, which is also the core of many recognition
implementations, because of its higher robustness against occlu-
sions and clutter in contrast to the normalized cross correlation.
Additionally, PatMax® and two novel approaches, which are re-
ferred to as Shape-Based Matching (Steger, 2001) and Modified
Hough Transform (Ulrich, 2001, Ulrich et al., 2001a, Ulrich et
al., 2001b) below, are included in our analysis. The least-squares
adjustment of the object’s pose assumes approximate values for
the transformation parameters and therefore, is no self-contained
object recognition method. Thus, it can be used to improve the
accuracy of the returned parameters from any recognition tech-
nique that uses the edge position and edge orientation as features
by a subsequent execution of the least-squares adjustment. In
our current study we use the shape-based matching as basis for
the least-squares adjustment. The development of our new ap-
proaches was motivated by the increasing industrial demands like
real-time computation and high recognition accuracy. Therefore,
the study is mainly concerned with the robustness, the subpixel
accuracy, and the required computation time of the six candidate
algorithms under different external circumstances.
2.1 Normalized Cross Correlation
For the purpose of evaluating the performance of the normalized
cross correlation (see (Brown, 1992), for example) we use — as
one typical representative — the current implementation of the
Matrox Imaging Library (MIL), which is a software development
toolkit of Matrox Electronic Systems Ltd. (Matrox, 2001). In
this implementation image pyramids are used for speed up. The
quality of the match is returned by mapping the normalized cross
correlation to a score value between 0.0 and 1.0. Subpixel accu-
racy is obtained by a subsequent refinement of the position and
orientation parameters by interpolation.
2.2 Hausdorff Distance
The Hausdorff distance measures the extent to which each pixel
of the binarized reference image lies near some pixel of the bina-
rized search image and vice versa. We use the implementation of
(Rucklidge, 1997), which uses the symmetric partial undirected
Hausdorff distance to reduce the sensitivity to outliers applying a
forward and a reverse fraction of points that must fulfill a certain
distance criterion. Only translations of the object can be recog-
nized and no subpixel refinement is included. Although the pa-
rameter space is treated in a hierarchical way there is no use of
image pyramids, which makes the algorithm very slow.
2.3 PatMax®
As described in its documentation (Cognex, 2000) PatMax ® uses
geometric information. The model representation, which can be
visualized by PatMax®, apparently consists of subpixel precise
edge points and respective edge directions. From this we can
conclude that PatMax® uses similar features as the shape-based
matching. To speed up the search, a coarse-to-fine approach is
implemented. To indicate the quality of the match, PatMax®
computes a score value between 0.0 and 1.0.
2.4 Shape-Based Matching
In this section the principle of our novel similarity measure is
briefly explained. A detailed description can be found in (Steger,
2001).
The model consists of a set of points and their corresponding di-
rection vectors. In the matching process, a transformed model
is compared to the image at a particular location by a similarity
measure. We suggest to sum the normalized dot product of the
direction vectors of the transformed model and the search image
over all points of the model to compute a matching score at a par-
ticular point of the image. The normalized similarity measure has
the property that it returns a number smaller than 1 as the score
of a potential match. A score of 1 indicates a perfect match be-
tween the model and the image. Furthermore, the score roughly
corresponds to the portion of the model that is visible in the im-
age. Once the object has been recognized on the lowest level of
the image pyramid, its position, rotation, and scale are extracted
to a resolution better than the discretization of the search space
by fitting a second order polynomial (in the four pose variables
horizontal translation, vertical translation, rotation, and scale) to
the similarity measure values in a 3 x 3 x 3 x 3 neighborhood
around the maximum score.
2.5 Modified Hough Transform
One weakness of the Generalized Hough Transform (GHT) (Bal-
lard, 1981) algorithm is the — in general — huge parameter
space. This requires large amounts of memory to store the ac-
cumulator array as well as high computational costs in the online
phase caused by the initialization of the array, the incrementa-
tion, and the search for maxima after the incrementation step. In
(Ulrich, 2001), (Ulrich et al., 2001a), and (Ulrich et al., 2001b)
we introduce our novel approach that eliminates the major draw-
backs of the GHT using a hierarchical search strategy in combi-
nation with an effective limitation of the search space. The result-
ing pose parameters of position and orientation are refined using
the method describe in Section 2.4. To evaluate the quality of a
match, a score value is computed as the peak height in the accu-
mulator array divided by the number of model points.
2.6 Shape-Based Matching Using Least-Squares
Adjustment
To improve the accuracy of the transformation parameters, we
developed a method that minimizes the distance between tan-
gents of the model shape and potential edge pixels in the search
A - 369