Proceedings of the Symposium on Global and Environmental Monitoring: Proceedings of the Symposium on Global and Environmental Monitoring

Information Fusion in 
Cartographic Feature Extraction 
from Aerial Imagery 
David M. McKeown 
Frederic P. Perlant 
Jefferey Shufelt 
Digital Mapping Laboratory 
School of Computer Science 
Carnegie Mellon University 
Pittsburgh, PA 15213 
Abstract 
The extraction of buildings from aerial imagery is a 
complex problem for automated computer vision. It requires 
locating regions in a scene that possess properties 
distinguishing them as man-made objects and opposed to 
naturally occurring terrain features. The building extraction 
process requires techniques that exploit knowledge about the 
structure of man-made objects. Techniques do exist that take 
advantage of this knowledge; various methods use edge-line 
analysis, shadow analysis, and stereo imagery analysis to 
produce building hypotheses. It is reasonable, however, to 
assume that no single detection method will correctly 
delineate or verify buildings in every scene. As an example, 
imagine a feature extraction system that relied on analysis of 
cast shadows to predict building locations in cases where the 
sun was directly above the scene. 
It seems clear that a cooperative-methods paradigm is 
useful in approaching the building extraction problem. Using 
this paradigm, each extraction technique provides 
information which can then be added or assimilated into an 
overall interpretation of the scene. Thus, our research focus 
is to explore the development of a computer vision system 
that integrates the results of various scene analysis 
techniques into an accurate and robust interpretation of the 
underlying three-dimensional scene. 
This paper briefly describes research results in two areas. 
First, we describe the problem of building hypothesis fusion 
using only monocular cues in aerial imagery. Several 
building extraction techniques are briefly surveyed, including 
four building extraction, verification, and clustering systems 
that form the basis for the work described here. A method 
for fusing the symbolic data generated by these systems is 
described, and applied to monocular image and stereo image 
data sets. Evaluation methods for the fusion results are 
described, and the fusion results are analyzed using these 
methods. 
The second research area examines how estimates of three- 
dimensional scene structure, as encoded in a scene disparity 
map, can be improved by the analysis of the original 
monocular imagery. In some sense this procedure is counter 
intuitive. Since we have already used the imagery to perform 
stereo matching, what information could be available in 
either of the single images that would improve on the stereo 
analysis? We describe the utilization of surface illumination 
information provided by the segmentation of the monocular 
image into fine surface patches of nearly homogeneous 
intensity to remove mismatches generated during stereo 
matching. Such patches are used to guide a statistical 
analysis of the disparity map based on the assumption that 
such patches correspond closely with physical surfaces in the 
scene. This technique is quite independent of whether the 
initial disparity map was generated by automated area-based 
or feature-based stereo matching. 
1 Introduction 
The extraction of significant man-made structures such as 
buildings and roads from aerial imagery is a complex 
problem that must be addressed in order to produce a fully 
automated cartographic feature extraction system. We focus 
on the building extraction process since buildings are present 
in almost all sites of cartographic interest and their robust 
detection and delineation requires techniques that exploit 
knowledge about man-made structures. There exist a 
multitude of techniques that take advantage of such 
knowledge; various methods use edge-line analysis, shadow 
analysis, stereo disparity analysis, and structural analysis to 
generate building hypotheses 1 ’ 2> 3 > 4,5< 6> 7 > 8 . 
It is reasonable, however, to assume that no single building 
extraction technique will perfectly delineate man-made 
structures in every scene. Consider the use of an edge- 
analysis method on an image where the ground intensity is 
similar to the intensity of the roofs of the buildings in the 
scene. As another example, consider the use of a shadow 
analysis method on an image in which the sun was directly 
above the scene. 
Clearly, a cooperative-methods paradigm is useful in 
approaching the building extraction problem. In this 
paradigm, it is assumed that no single method can provide a 
completely accurate or complete set of building hypotheses 
for a scene; each method can, however, provide a subset of 
the information necessary to produce an improved 
interpretation of building structure in the scene. For instance, 
a shadow-based method can provide useful information in 
situations where ground and roof intensity are similar; an 
edge-line analysis method can provide disambiguating 
information in cases where shadows were weak or 
nonexistent, or in situations where structures were 
sufficiently short that disparity analysis would not provide 
useful information. The implicit assumption of this paradigm 
is that the information produced by each detection technique 
can be integrated into a more meaningful collection of 
building hypotheses. 
Stereo matching provides a direct measurement of building 
location and height. In complex urban scenes, stereo 
matching based upon feature matching, i.e., edges, lines, and 
contours, appears to provide more accurate and robust 
matching than area-based techniques. This is primarily due 
to the ability of feature-based approaches to detect large 
depth discontinuities found in urban scenes. However, 
feature-based techniques generally provide only a sparse set 
of match points from which a three-dimensional surface is 
usually interpolated. In Section 3 we describe a method to 
integrate monocular surface intensity information with the 
stereo disparity map to refine the height estimates and reduce 
the effect of stereo matching errors. 
140
1
2
...
158
159
160
161
162
...
951
952
Full text: Proceedings of the Symposium on Global and Environmental Monitoring (Part 1)

Access restriction

Copyright

Note to user