Information Fusion in
Cartographic Feature Extraction
from Aerial Imagery
David M. McKeown
Frederic P. Perlant
Jefferey Shufelt
Digital Mapping Laboratory
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
Abstract
The extraction of buildings from aerial imagery is a
complex problem for automated computer vision. It requires
locating regions in a scene that possess properties
distinguishing them as man-made objects and opposed to
naturally occurring terrain features. The building extraction
process requires techniques that exploit knowledge about the
structure of man-made objects. Techniques do exist that take
advantage of this knowledge; various methods use edge-line
analysis, shadow analysis, and stereo imagery analysis to
produce building hypotheses. It is reasonable, however, to
assume that no single detection method will correctly
delineate or verify buildings in every scene. As an example,
imagine a feature extraction system that relied on analysis of
cast shadows to predict building locations in cases where the
sun was directly above the scene.
It seems clear that a cooperative-methods paradigm is
useful in approaching the building extraction problem. Using
this paradigm, each extraction technique provides
information which can then be added or assimilated into an
overall interpretation of the scene. Thus, our research focus
is to explore the development of a computer vision system
that integrates the results of various scene analysis
techniques into an accurate and robust interpretation of the
underlying three-dimensional scene.
This paper briefly describes research results in two areas.
First, we describe the problem of building hypothesis fusion
using only monocular cues in aerial imagery. Several
building extraction techniques are briefly surveyed, including
four building extraction, verification, and clustering systems
that form the basis for the work described here. A method
for fusing the symbolic data generated by these systems is
described, and applied to monocular image and stereo image
data sets. Evaluation methods for the fusion results are
described, and the fusion results are analyzed using these
methods.
The second research area examines how estimates of three-
dimensional scene structure, as encoded in a scene disparity
map, can be improved by the analysis of the original
monocular imagery. In some sense this procedure is counter
intuitive. Since we have already used the imagery to perform
stereo matching, what information could be available in
either of the single images that would improve on the stereo
analysis? We describe the utilization of surface illumination
information provided by the segmentation of the monocular
image into fine surface patches of nearly homogeneous
intensity to remove mismatches generated during stereo
matching. Such patches are used to guide a statistical
analysis of the disparity map based on the assumption that
such patches correspond closely with physical surfaces in the
scene. This technique is quite independent of whether the
initial disparity map was generated by automated area-based
or feature-based stereo matching.
1 Introduction
The extraction of significant man-made structures such as
buildings and roads from aerial imagery is a complex
problem that must be addressed in order to produce a fully
automated cartographic feature extraction system. We focus
on the building extraction process since buildings are present
in almost all sites of cartographic interest and their robust
detection and delineation requires techniques that exploit
knowledge about man-made structures. There exist a
multitude of techniques that take advantage of such
knowledge; various methods use edge-line analysis, shadow
analysis, stereo disparity analysis, and structural analysis to
generate building hypotheses 1 ’ 2> 3 > 4,5< 6> 7 > 8 .
It is reasonable, however, to assume that no single building
extraction technique will perfectly delineate man-made
structures in every scene. Consider the use of an edge-
analysis method on an image where the ground intensity is
similar to the intensity of the roofs of the buildings in the
scene. As another example, consider the use of a shadow
analysis method on an image in which the sun was directly
above the scene.
Clearly, a cooperative-methods paradigm is useful in
approaching the building extraction problem. In this
paradigm, it is assumed that no single method can provide a
completely accurate or complete set of building hypotheses
for a scene; each method can, however, provide a subset of
the information necessary to produce an improved
interpretation of building structure in the scene. For instance,
a shadow-based method can provide useful information in
situations where ground and roof intensity are similar; an
edge-line analysis method can provide disambiguating
information in cases where shadows were weak or
nonexistent, or in situations where structures were
sufficiently short that disparity analysis would not provide
useful information. The implicit assumption of this paradigm
is that the information produced by each detection technique
can be integrated into a more meaningful collection of
building hypotheses.
Stereo matching provides a direct measurement of building
location and height. In complex urban scenes, stereo
matching based upon feature matching, i.e., edges, lines, and
contours, appears to provide more accurate and robust
matching than area-based techniques. This is primarily due
to the ability of feature-based approaches to detect large
depth discontinuities found in urban scenes. However,
feature-based techniques generally provide only a sparse set
of match points from which a three-dimensional surface is
usually interpolated. In Section 3 we describe a method to
integrate monocular surface intensity information with the
stereo disparity map to refine the height estimates and reduce
the effect of stereo matching errors.
140