Photogrammetric computer vision: Papers accepted on the basis of peer-review full manuscripts

kalliany, r.; leberl, franz w.
ISPRS Commission III, Vol.34, Part 3A ,,Photogrammetric Computer Vision", Graz, 2002 
  
corresponds to the portion of the model that is visible in the im- 
age. For example, if the object is 50% occluded, the score (on 
average) cannot exceed 0.5. This is a highly desirable property 
because it gives the user the means to select an intuitive threshold 
for when an object should be considered as recognized. 
A desirable feature of the above similarity measures (2)-(4) is 
that they do not need to be evaluated completely when object 
recognition is based on a threshold smin for the similarity mea- 
sure that a potential match must achieve. Let s; denote the partial 
sum of the dot products up to the j-th element of the model. For 
the match metric that uses the sum of the normalized dot prod- 
ucts, this is: 
| deve) | 
(5) 
YE dill + legs ll 
Obviously, all the remaining terms of the sum are all < 1. There- 
fore, the partial score can never achieve the required score Smin 
if s; < Smin — 1 + j/n, and hence the evaluation of the sum 
can be discontinued after the j-th element whenever this condi- 
tion is fulfilled. This criterion speeds up the recognition process 
considerably. 
Nevertheless, further speed-ups are highly desirable. Another cri- 
terion is to require that all partial sums have a score better than 
Smin, Le, 8j > Smin. When this criterion is used, the search will 
be very fast, but it can no longer be ensured that the object recog- 
nition finds the correct instances of the model because if missing 
parts of the model are checked first, the partial score will be be- 
low the required score. To speed up the recognition process with a 
very low probability of not finding the object although it is visible 
in the image, the following heuristic can be used: the first part of 
the model points is examined with a relatively safe stopping cri- 
terion, while the remaining part of the model points are examined 
with the hard threshold spin. The user can specify what fraction 
of the model points is examined with the hard threshold with a 
parameter g. If g = 1, all points are examined with the hard 
threshold, while for g = 0, all points are examined with the safe 
stopping criterion. With this, the evaluation of the partial sums is 
stopped whenever s; « min(smin — 1 -- fj /n, $minj /n), where 
f = (1 — gSmin)/(1 — Smin). Typically, the parameter g can 
be set to values as high as 0.9 without missing an instance of the 
model in the image. 
3 OBJECT RECOGNITION 
The above similarity measures are applied in an object recogni- 
tion system for industrial inspection that recognizes objects under 
similarity transformations, i.e., translation, rotation, and uniform 
scaling, in real time. Although only similarity transformations are 
implemented at the moment, extensions to general affine trans- 
formations are not difficult to implement. The system consists of 
two modules: an offline generation of the model and an online 
recognition. 
The model is generated from an image of the object to be recog- 
nized. An arbitrary region of interest specifies the object's loca- 
tion in the image. Usually, the ROI is specified by the user. Alter- 
natively, it can be generated by suitable segmentation techniques. 
To speed up the recognition process, the model is generated in 
multiple resolution levels, which are constructed by building an 
image pyramid from the original image. The number of pyramid 
levels lax is chosen by the user. 
Each resolution level consists of all possible rotations and scal- 
ings of the model, where thresholds $5, and $max for the angle 
and omin and omax for the scale are selected by the user. The 
step length for the discretization of the possible angles and scales 
can either be done automatically by a method similar to the one 
described in (Borgefors, 1988) or be set by the user. In higher 
pyramid levels, the step length for the angle is computed by dou- 
bling the step length of the next lower pyramid level. 
The rotated and scaled models are generated by rotating and scal- 
ing the original image of the current pyramid level and perform- 
ing the feature extraction in the rotated image. This is done be- 
cause the feature extractors may be anisotropic, i.e., the extracted 
direction vectors may depend on the orientation of the feature in 
the image in a biased manner. If it is known that the feature ex- 
tractor is isotropic, the rotated models may be generated by per- 
forming the feature extraction only once per pyramid level and 
transforming the resulting points and direction vectors. 
The feature extraction can be done by a number of different im- 
age processing algorithms that return a direction vector for each 
image point. One such class of algorithms are edge detectors, e.g, 
the Sobel or Canny (Canny, 1986) operators. Another useful class 
of algorithms are line detectors (Steger, 1998). Finally, corner de- 
tectors that return a direction vector, e.g., (Fórstner, 1994), could 
also be used. Because of runtime considerations the Sobel filter is 
used in the current implementation of the object recognition sys- 
tem. Since in industrial inspection the lighting can be controlled, 
noise does not pose a significant problem in these applications. 
To recognize the model, an image pyramid is constructed for the 
image in which the model should be found. For each level of 
the pyramid, the same filtering operation that was used to gen- 
erate the model, e.g., Sobel filtering, is applied to the image. 
This returns a direction vector for each image point. Note that 
the image is not segmented, i.e., thresholding or other operations 
are not performed. This results in true robustness to illumination 
changes. 
To identify potential matches, an exhaustive search is performed 
for the top level of the pyramid, i.e., all precomputed models of 
the top level of the model resolution hierarchy are used to com- 
pute the similarity measure via (2), (3), or (4) for all possible 
poses of the model. A potential match must have a score larger 
than a user-specified threshold smin and the corresponding score 
must be a local maximum with respect to neighboring scores. As 
described in Section 2, the threshold smin is used to speed up the 
search by terminating the evaluation of the similarity measure as 
early as possible. With the termination criteria, this seemingly 
brute-force strategy actually becomes extremely efficient. On av- 
erage, about 9 pixels of the model are tested for every pose on the 
top level of the pyramid. 
After the potential matches have been identified, they are tracked 
through the resolution hierarchy until they are found at the lowest 
level of the image pyramid. Various search strategies like depth- 
first, best-first, etc., have been examined. It turned out that a 
breadth-first strategy is preferable for various reasons, most no- 
tably because a heuristic for a best-first strategy is hard to define, 
and because depth-first search results in slower execution if all 
matches should be found. 
Once the object has been recognized on the lowest level of the 
image pyramid, its position and rotation are extracted to a reso- 
lution better than the discretization of the search space, i.e., the 
translation is extracted with subpixel precision and the angle and 
scale with a resolution better than their respective step lengths. 
This is done by fitting a second order polynomial (in the four 
pose variables) to the similarity measure values in a 3 x 3 x 3 x 3 
neighborhood around the maximum score. The coefficients of 
the polynomial are obtained by convolution with 4D facet model 
masks. The corresponding 2D masks are given in (Steger, 1998). 
They generalize to arbitrary dimensions in a straightforward man- 
ner. 
4 LEAST-SQUARES POSE REFINEMENT 
While the pose obtained by the extrapolation algorithm is accu- 
rate enough for most applications, in some applications an even 
A - 347
1
2
...
360
361
362
363
364
...
456
457
Full text: Papers accepted on the basis of peer-review full manuscripts (Part A)

Access restriction

Copyright

Note to user