Nevatia - 1
Role of Matching and Grouping Operations
in
Automated Scene Analysis
R. Nevatia
Institute for Robotics and Intelligent Systems
University of Southern California
Los Angeles, CA 90089-0273
email: nevatia@usc.edu
1 INTRODUCTION
Operations of matching and grouping are central to many tasks in automated scene analysis.
We will consider two tasks: one is that of making models automatically from one or more image
(also known as the task of mapping or modelling ), the other is that of recognizing the identity and
pose of previously modelled objects in a scene (as in tasks of object recognition, map updating or
scene registration). These tasks are highly dependent on grouping and matching operations to be
performed successfully, as discussed below.
2 SOME TASK EXAMPLES
2.1 Modelling
First, let us consider the task of building 3-D models from one or more images. This task is
made difficult due to many reasons: the objects need to be delineated from the background and
other objects, and the 3-D structure needs to be inferred. Consider, for example, the image of
some building structures shown in Fig. 1. While the buildings are readily visible to a human, the
task of automating this process is a difficult one and requires many steps. At the first stage, we
need to detect some primitive features. Fig. 1(b) shows the results of applying a line finder to the
image in Fig. 1(a). Fig. 1(b) now illustrates the complexity of the task. The buildings are not
bounded by closed contours and a large number of contours that are not directly related to the
building structures are present. We need to separate the two kinds of contours, i.e. group them into
meaningful structures. Thus, the task of perceptual grouping becomes an essential one.
Note that this problem is not specific to building detection or outdoor scenes. It is also
present in many indoor scenes. An example of an image (Fig. 2(a)) and lines detected in it is
shown in Fig. 2(b). Again, while the objects in the image are quite apparent to a human, Fig. 2(b)
illustrates the difficulty of grouping of contours corresponding to the individual objects.
For many tasks, we also need to reconstruct the 3-D structure of the detected objects. This
too is rather effortless for humans but difficult to automate. This is most difficult when only a
single image is given as in this case the problem is highly underconstrained. A number of