Photogrammetric computer vision: Papers accepted on the basis of peer-review full manuscripts

kalliany, r.; leberl, franz w.
ISPRS Commission III, Vol.34, Part 3A ,Photogrammetric Computer Vision“, Graz, 2002 
  
model M is generated from an image of the object to be recog- 
nized. A region of interest (ROT) R specifies the object's location 
in the image. The image part defined by R is called reference im- 
age I". The image, in which the object should be recognized, will 
be referred to as the search image I^. Almost all object recogni- 
tion approaches can be split into two successive phases: the of- 
fline phase including the generation of the model and the online 
phase, in which the constructed model is used to find the object 
in the search image. 
The transformation class 7T , e.g., translations or euclidean, sim- 
ilarity, affine, or arbitrary projective transformations, specifies 
the degrees of freedom of the object, i.e., which transformations 
the object may undergo in the search image. For all similarity 
measures the object recognition step is performed by transform- 
ing the model to a user-limited range of discrete transformations 
T; € 7T within the transformation class. For each transformed 
model M? — T;M the similarity measure is calculated between 
M* and the corresponding representation of the search image. 
The representation can, for example, be described by the raw gray 
values in both images (e.g., when using the normalized cross cor- 
relation) or by the corresponding binarized edges (e.g., when us- 
ing the Hausdorff distance). The maximum or minimum of the 
match metric then indicates the pose of the recognized object. 
The first method to be analyzed is the Normalized Cross Corre- 
lation (Brown, 1992) because it is a rather widely spread method 
in industry and therefore well known in the application area of 
object recognition. The Hausdorff Distance (Rucklidge, 1997) is 
the second candidate, which is also the core of many recognition 
implementations, because of its higher robustness against occlu- 
sions and clutter in contrast to the normalized cross correlation. 
Additionally, PatMax® and two novel approaches, which are re- 
ferred to as Shape-Based Matching (Steger, 2001) and Modified 
Hough Transform (Ulrich, 2001, Ulrich et al., 2001a, Ulrich et 
al., 2001b) below, are included in our analysis. The least-squares 
adjustment of the object’s pose assumes approximate values for 
the transformation parameters and therefore, is no self-contained 
object recognition method. Thus, it can be used to improve the 
accuracy of the returned parameters from any recognition tech- 
nique that uses the edge position and edge orientation as features 
by a subsequent execution of the least-squares adjustment. In 
our current study we use the shape-based matching as basis for 
the least-squares adjustment. The development of our new ap- 
proaches was motivated by the increasing industrial demands like 
real-time computation and high recognition accuracy. Therefore, 
the study is mainly concerned with the robustness, the subpixel 
accuracy, and the required computation time of the six candidate 
algorithms under different external circumstances. 
2.1 Normalized Cross Correlation 
For the purpose of evaluating the performance of the normalized 
cross correlation (see (Brown, 1992), for example) we use — as 
one typical representative — the current implementation of the 
Matrox Imaging Library (MIL), which is a software development 
toolkit of Matrox Electronic Systems Ltd. (Matrox, 2001). In 
this implementation image pyramids are used for speed up. The 
quality of the match is returned by mapping the normalized cross 
correlation to a score value between 0.0 and 1.0. Subpixel accu- 
racy is obtained by a subsequent refinement of the position and 
orientation parameters by interpolation. 
2.2 Hausdorff Distance 
The Hausdorff distance measures the extent to which each pixel 
of the binarized reference image lies near some pixel of the bina- 
rized search image and vice versa. We use the implementation of 
(Rucklidge, 1997), which uses the symmetric partial undirected 
Hausdorff distance to reduce the sensitivity to outliers applying a 
forward and a reverse fraction of points that must fulfill a certain 
distance criterion. Only translations of the object can be recog- 
nized and no subpixel refinement is included. Although the pa- 
rameter space is treated in a hierarchical way there is no use of 
image pyramids, which makes the algorithm very slow. 
2.3 PatMax® 
As described in its documentation (Cognex, 2000) PatMax ® uses 
geometric information. The model representation, which can be 
visualized by PatMax®, apparently consists of subpixel precise 
edge points and respective edge directions. From this we can 
conclude that PatMax® uses similar features as the shape-based 
matching. To speed up the search, a coarse-to-fine approach is 
implemented. To indicate the quality of the match, PatMax® 
computes a score value between 0.0 and 1.0. 
2.4 Shape-Based Matching 
In this section the principle of our novel similarity measure is 
briefly explained. A detailed description can be found in (Steger, 
2001). 
The model consists of a set of points and their corresponding di- 
rection vectors. In the matching process, a transformed model 
is compared to the image at a particular location by a similarity 
measure. We suggest to sum the normalized dot product of the 
direction vectors of the transformed model and the search image 
over all points of the model to compute a matching score at a par- 
ticular point of the image. The normalized similarity measure has 
the property that it returns a number smaller than 1 as the score 
of a potential match. A score of 1 indicates a perfect match be- 
tween the model and the image. Furthermore, the score roughly 
corresponds to the portion of the model that is visible in the im- 
age. Once the object has been recognized on the lowest level of 
the image pyramid, its position, rotation, and scale are extracted 
to a resolution better than the discretization of the search space 
by fitting a second order polynomial (in the four pose variables 
horizontal translation, vertical translation, rotation, and scale) to 
the similarity measure values in a 3 x 3 x 3 x 3 neighborhood 
around the maximum score. 
2.5 Modified Hough Transform 
One weakness of the Generalized Hough Transform (GHT) (Bal- 
lard, 1981) algorithm is the — in general — huge parameter 
space. This requires large amounts of memory to store the ac- 
cumulator array as well as high computational costs in the online 
phase caused by the initialization of the array, the incrementa- 
tion, and the search for maxima after the incrementation step. In 
(Ulrich, 2001), (Ulrich et al., 2001a), and (Ulrich et al., 2001b) 
we introduce our novel approach that eliminates the major draw- 
backs of the GHT using a hierarchical search strategy in combi- 
nation with an effective limitation of the search space. The result- 
ing pose parameters of position and orientation are refined using 
the method describe in Section 2.4. To evaluate the quality of a 
match, a score value is computed as the peak height in the accu- 
mulator array divided by the number of model points. 
2.6 Shape-Based Matching Using Least-Squares 
Adjustment 
To improve the accuracy of the transformation parameters, we 
developed a method that minimizes the distance between tan- 
gents of the model shape and potential edge pixels in the search 
A - 369
1
2
...
382
383
384
385
386
...
456
457
Full text: Papers accepted on the basis of peer-review full manuscripts (Part A)

Access restriction

Copyright

Note to user