Nevatia - 7
4 MATCHING
Matching of model and image descriptions is needed for many tasks including that of object
recognition, scene registration, model validation and updating. Matching of image descriptions
among multiple images is needed for the task of inferring 3-D structure and motion detection.
One of the key issues in matching is the level of representation at which matching should be
performed. The levels used vary from the use of raw intensity values, to the use of features such
as lines and corners, to use of 3-D surfaces and volumes. In general, matching at lower levels
requires simpler algorithms, such as correlation or simple distance metrics, but is more
ambiguous. Matching at higher levels, using complex structures, requires more complex
algorithms, such as graph matching, but is likely to give more distinctive results. Another issue is
that of the difficulty of computing the higher level representations and errors caused in this
process. Thus, the correct level of matching may depend not only on the desired task but on the
ability of the description processes as well. Another issue to consider is the scale at which the
matching should take place: in scenes with fixed objects, it may be advantageous to match large
areas whereas in scenes with many moveable objects, matching needs to be more local. We
illustrate these choices with a few examples:
a) Stereo Matching :
To extract 3-D structure from two or more images, we need to compute correspondences
between points or features in the multiple images; several approaches are described in [3]. Many
of the early systems used methods of intensity correlation [4]. These methods work reasonably
well in presence of random texture and smoothly varying terrain, but are less effective in cultural
environments with abrupt depth changes and large homogeneous areas (such as in scenes with
many buildings). We have experimented with matching at the level of line segments [2, 8],
junctions as well as higher level hypotheses such as surfaces. As stated above, matching at higher
levels is easier and less ambiguous but it is not always possible to compute the higher level
hypotheses correctly by monocular analysis. Note that for this task, it is not possible to compute a
single local transformation as the features are transformed differently depending on their 3-D
locations.
Fig. 7 depicts the final selected rectangles obtained from grouping the matched lines shown
in Fig. 3. The grouping is based on the formation of parallel matches from line matches across the
images. The line matches (and parallel matches) may occur in 2 or more images to be used for
further consideration. Evidence of closure for the parallel matches is a good indication that a
rectangular structure exists. Note that grouping and matching are intertwined processes i.e. there
is grouping before and after matching. Coupled with verification, using wall and shadow
evidence, the final verified hypotheses are made.
The issue of matching over more than 2 images affords the possibility of pairwise or
simultaneous matching. Though pairwise matching over all images may be equivalent to
simultaneous matching, it is, in general, not so. The obvious advantage of pairwise matching is its