ISPRS Commission III, Vol.34, Part 3A „Photogrammetric Computer Vision", Graz, 2002
higher accuracy is desirable. This can be achieved through a
least-squares adjustment of the pose parameters. To achieve a
better accuracy than the extrapolation, it is necessary to extract
the model points as well as the feature points in the image with
subpixel accuracy. If this would not be done, the image and
model points would be separated radially by about 0.25 pixels
on average if each model point is matched to its closest image
point. However, even if the points are extracted with subpixel
accuracy, an algorithm that performs a least-squares adjustment
based on closest point distances would not improve the accuracy
much since the points would still have an average distance sig-
nificantly larger that 0 tangentially because the model and image
points are not necessarily sampled at the same points and dis-
tances. Because of this, the proposed algorithm finds the closest
image point for each model point and then minimizes the sum of
the squared distances of the image points to a line defined by their
corresponding model point and the corresponding tangent to the
model point, i.e., the directions of the model points are taken to be
correct and are assumed to describe the direction of the object's
border. If, for example, an edge detector is used, the direction
vectors of the model are perpendicular to the object boundary,
and hence the equation of a line through a model point tangent to
the object boundary is given by t:(x — x:) 4 ui(y — yi) = 0. Let
qu = (vi,w:)" denote the matched image points corresponding
to the model points p;. Then, the following function is minimized
to refine the pose a:
n
d(a) = V [t:i(vi(a) — x) + wi(wi(a) ~ 3)” — min. . (9
az
The potential corresponding image points in the search image are
obtained by a non-maximum suppression only and are extrapo-
lated to subpixel accuracy (Steger, 2000). By this, a segmentation
of the search image is avoided, which is important to preserve
the invariance against arbitrary illumination changes. For each
model point the corresponding image point in the search image
is chosen as the potential image point with the smallest euclidian
distance using the pose obtained by the extrapolation to transform
the model to the search image. Because the points in the search
image are not segmented, spurious image points may be brought
into correspondence with model points. Therefore, to make the
adjustment robust, only correspondences with a distance smaller
than a robustly computed standard deviation of the distances are
used for the adjustment. Since (6) results in a linear equation sys-
tem when similarity transformations are considered, one iteration
suffices to find the minimum distance. However, since the point
correspondences may change by the refined pose, an even higher
accuracy can be gained by iterating the correspondence search
and pose refinement. Typically, after three iterations the accuracy
of the pose no longer improves.
5 EXAMPLE
Figure 1 displays an example of recognizing multiple objects at
different scales and rotations. The model image is shown in Fig-
ure 1(a), while Figure 1(b) shows that all three instances of the
model have been recognized correctly despite the fact that two of
them are occluded, that one of them is printed with the contrast
reversed, and that two of the models were printed with slightly
different shapes. The time to recognize the models was 103 ms
on an 800 MHz Pentium III running under Linux.
6 PERFORMANCE EVALUATION
To assess the performance of the proposed object recognition sys-
tem, two different criteria were used: the recognition rate and the
subpixel accuracy of the results.
A - 348
(b) Found objects
Figure 1: Example of recognizing multiple objects. Note that the
model is found despite global contrast reversals and despite the
fact that two of the models were printed with slightly different
shapes.
To test the recognition rate, 500 images of an IC were taken. The
IC was occluded to various degrees with various objects, so that
in addition to occlusion, clutter of various degrees was created in
the image. Figure 2 shows six of the 500 images that were used
to test the recognition rate. The model was generated from the
print on the IC in the top left image of Figure 2. On the lowest
pyramid level it contained 2127 edge points.
An effort was made to keep the IC in exactly the same position
in the image in order to be able to measure the degree of occlu-
sion. Unfortunately, the IC moved very slightly (by less than one
pixel) during the acquisition of the images. The true amount of
occlusion was determined by extracting edges from the images
and intersecting the edge regions with the edges that constitute
the model. Since the objects that occlude the IC generate clutter
edges, this actually underestimates the occlusion.
The model was extracted in the 500 images with smin = 0.3,
i.e., the method should find the object despite 70% occlusion.
Only the translation parameters were determined. The average
recognition time was 22 ms. The model was recognized in 478
images, i.e., the recognition rate was 95.6%. By visual inspec-
tion, it was determined that in 15 of the 22 misdetection cases the
IC was occluded by more than 70%. If these cases are removed
the recognition rate rises to 98.6%. In the remaining seven cases,
the occlusion was close to 70%. Figure 3(a) displays a plot of
the extracted scores against the estimated visibility of the object.
The instances in which the model was not found are denoted by
a score of 0, i.e., they lie on the z axis of the plot. Figure 3(b)
shows the errors of the extracted positions when extrapolating the
pose as described in Section 3. It can be seen that the IC was acci-
dentally shifted twice. The position errors are all very close to the
three cluster centers. Some of the larger errors in the y coordinate
result from refraction effects caused by the transparent ruler that
was used in some images to occlude the IC (see the top right im-
age of Figure 2). Figures 3(c) and (d) display the position errors