Full text: Papers accepted on the basis of peer-review full manuscripts (Part A)

ISPRS Commission III, Vol.34, Part 3A „Photogrammetric Computer Vision“, Graz, 2002 
  
OCCLUSION, CLUTTER, AND ILLUMINATION INVARIANT OBJECT RECOGNITION 
Carsten Steger 
MV Tec Software GmbH 
NeherstraBe 1, 81675 München, Germany 
steger@mvtec.com 
Commission III, Working Group III/5 
KEY WORDS: Computer Vision, Real-Time Object Recognition 
ABSTRACT 
An object recognition system for industrial inspection that recognizes objects under similarity transformations in real time is proposed. 
It uses novel similarity measures that are inherently robust against occlusion, clutter, and nonlinear illumination changes. They can be 
extended to be robust to global as well as local contrast reversals. The matching is performed based on the maxima of the similarity 
measure in the transformation space. For normal applications, subpixel-accurate poses are obtained by extrapolating the maxima of 
the similarity measure from discrete samples in the transformation space. For applications with very high accuracy requirements, 
least-squares adjustment is used to further refine the extracted pose. 
1 INTRODUCTION 
Object recognition is used in many computer vision applications. 
It is particularly useful for industrial inspection tasks, where of- 
ten an image of an object must be aligned with a model of the 
object. The transformation (pose) obtained by the object recog- 
nition process can be used for various tasks, e.g., pick and place 
operations or quality control. In most cases, the model of the ob- 
ject is generated from an image of the object. This 2D approach 
is taken because it usually is too costly or time consuming to cre- 
ate a more complicated model, e.g., a 3D CAD model. Therefore, 
in industrial inspection tasks one is usually interested in match- 
ing a 2D model of an object to the image. The object may be 
transformed by a certain class of transformations, depending on 
the particular setup, e.g., translations, euclidean transformations, 
similarity transformations, or general 2D affine transformations 
(which are usually taken as an approximation to the true perspec- 
tive transformations an object may undergo). 
A large number of object recognition strategies exist. The ap- 
proach to object recognition proposed in this paper uses pixels as 
its geometric features, i.e., not higher level features like lines or 
elliptic arcs. Therefore, only similar pixel-based strategies will 
be reviewed. 
Several methods have been proposed to recognize objects in im- 
ages by matching 2D models to images. A survey of matching 
approaches is given in (Brown, 1992). In most 2D matching 
approaches the model is systematically compared to the image 
using all allowable degrees of freedom of the chosen class of 
transformations. The comparison is based on a suitable similar- 
ity measure (also called match metric). The maxima or minima 
of the similarity measure are used to decide whether an object is 
present in the image and to determine its pose. To speed up the 
recognition process, the search is usually done in a coarse-to-fine 
manner, e.g., by using image pyramids (Tanimoto, 1981). 
The simplest class of object recognition methods is based on the 
gray values of the model and image itself and uses normalized 
cross correlation or the sum of squared or absolute differences as 
a similarity measure (Brown, 1992). Normalized cross correla- 
tion is invariant to linear brightness changes but is very sensitive 
to clutter and occlusion as well as nonlinear contrast changes. 
The sum of gray value differences is not robust to any of these 
changes, but can be made robust to linear brightness changes by 
explicitly incorporating them into the similarity measure, and to 
a moderate amount of occlusion and clutter by computing the 
similarity measure in a statistically robust manner (Lai and Fang, 
1999). 
A more complex class of object recognition methods does not use 
the gray values of the model or object itself, but uses the object's 
edges for matching (Borgefors, 1988, Rucklidge, 1997). In all ex- 
isting approaches, the edges are segmented, i.e., a binary image 
is computed for both the model and the search image. Usually, 
the edge pixels are defined as the pixels in the image where the 
magnitude of the gradient is maximum in the direction of the gra- 
dient. Various similarity measures can then be used to compare 
the model to the image. The similarity measure in (Borgefors, 
1988) computes the average distance of the model edges and the 
image edges. The disadvantage of this similarity measure is that 
it is not robust to occlusions because the distance to the nearest 
edge increases significantly if some of the edges of the model are 
missing in the image. 
The Hausdorff distance similarity measure used in (Rucklidge, 
1997) tries to remedy this shortcoming by calculating the maxi- 
mum of the k-th largest distance of the model edges to the image 
edges and the /-th largest distance of the image edges and the 
model edges. If the model contains n points and the image con- 
tains m edge points, the similarity measure is robust to 100k /n% 
occlusion and 100] /m% clutter. Unfortunately, an estimate for m 
is needed to determine /, which is usually not available. 
All of these similarity measures have the disadvantage that they 
do not take into account the direction of the edges. In (Olson and 
Huttenlocher, 1997) it is shown that disregarding the edge direc- 
tion information leads to false positive instances of the model in 
the image. The similarity measure proposed in (Olson and Hut- 
tenlocher, 1997) tries to improve this by modifying the Hausdorff 
distance to also measure the angle difference between the model 
and image edges. Unfortunately, the implementation is based on 
multiple distance transformations, which makes the algorithm too 
computationally expensive for industrial inspection. 
Finally, another class of edge based object recognition algorithms 
is based on the generalized Hough transform (Ballard, 1981). Ap- 
proaches of this kind have the advantage that they are robust to 
occlusion as well as clutter. Unfortunately, the GHT requires ex- 
tremely accurate estimates for the edge directions or a complex 
and expensive processing scheme, e.g., smoothing the accumula- 
tor space, to determine whether an object is present and to deter- 
A - 345 
 
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.