142
Recent work has addressed the problem of fusing the
information provided by analysis systems such as BABE,
SHADE, SHAVE, and GROUPER into more complete and
accurate sets of building hypotheses. A monocular fusion
system has been developed that merges the geometric
building boundaries into composite boundaries. Figure 4
shows the fusion result for the Washington, D.C. scene.
Performance analysis indicates that the percentage of
building pixels identified correctly in this scene for SHADE,
SHAVE and GROUPER is 37.5%, 47.2%, and 48.7%,
respectively. The results of monocular fusion, however,
improve the overall building pixel classification rate to
77.7%. We have performed fusion analysis over a test
database of 8 images and have observed similar performance
improvements 10, .
2.1 Further Issues in Monocular Fusion
The problems of building structure hypothesis integration
are amenable to the techniques of information fusion. Recent
work in this area has shown that simple image processing
techniques can integrate complementary sources of building
hypothesis data to provide improved detection of man-made
objects in aerial imagery 11 . These building fusion methods
provide a simple and effective means for increasing the
building detection rate for a scene. There remain, however,
other issues to be addressed in the fusion of monocular scene
analysis systems.
The fusion techniques produce qualitatively accurate
building delineations, in the sense that few buildings are
overlooked in the extraction process; quantitative
delineations, however, are typically poor due to the
accumulation of delineation errors in the data produced by
the monocular analysis subsystems. While applications such
as flight simulation are not necessarily adversely affected by
this problem, cartographic applications may require very
accurate delineations of man-made structures. We would
like to address this issue by examining the interactions
between data during the information fusion process. For
example, during the fusion process, an estimate of "building
density" can be produced; that is, an estimate of the
likelihood of building structure for each pixel in an image.
These likelihood estimates could be used to refine the fused
building delineations. In addition, the fusion process
provides a composite set of building boundaries. Various
subsets of these boundaries could be combined and evaluated
with respect to image gradient, shadow casting, and disparity
measures to produce improved boundaries. Another
approach would use these boundaries as intermediate results
to be analyzed by each of the component monocular analysis
systems. These systems could refine their initial building
estimates in accordance with the fused boundaries, and then
these refined estimates could themselves be integrated to
produce improved fusions. This idea suggests an iterative
process in which building fusions are used to refine their
components until some quality measure is exceeded.
If multiple views of a scene are available, either from
different vantage points, or from sensors with different
spectral characteristics, then more information is available
for an information fusion process. Under the cooperative-
methods paradigm, monocular analysis techniques could be
applied to multiple images to produce initial building
detection data, which could then be integrated by information
fusion techniques to produce improved building detection
rates for the scene. Although multiple sets of data may not
always be available, their presence requires the development
and/or extension of fusion techniques to take advantage of
the additional information such data provides.
3 Improvement and Interpretation of 3D Scene
Analysis
The main goal of many vision systems, and in particular
automated cartographic feature extraction systems, is to
recover the three-dimensional structure of a scene. Stereo
analysis is a common technique used for this problem, and
much work has been done in this area. Nevertheless, the
difficulty of the task (in terms of registration and stereo
matching) limits the success of any single technique. To
obtain complete and accurate height estimates for a scene, it
seems clear that we will need to utilize more knowledge than
that used by any stereo matching process. Some work is
being done on the interpolation of sparse disparity
information to generate reliable and dense height
information, and work is being done on post-processing of
stereo matching results to ensure consistency of the derived
height estimates.
Many three-dimensional estimation improvement
approaches use external knowledge about the scene, in the
form of models that constrain certain aspects of scene
structure. An interesting approach would he the use of
information from multiple sources to achieve consistent
disparity results, by applying a simple information fusion
model. Such an approach allows for the integration of
incomplete and inconsistent information to refine height
estimates.
Initial work under this paradigm used a region-based
interpretation model for the information fusion and
refinement process 7 . The assumption of this model is that
uniform image radiometry is produced by planar surfaces, of
specific orientations and materials. Under this assumption,
the segmentation of the monocular images into fine surface
patches of nearly homogeneous intensity will ideally result in
a segmentation delineating planar surfaces in the scene.
Having obtained segmentations of the images in which
regions of nearly homogeneous intensity are delineated, it
becomes possible to refine initial height estimates for the
scene. Since each region is assumed to correspond to a
planar surface in the scene, we can assign disparity values to
each region to produce an initial refinement of the height
estimates. This is done by histogramming the disparity
values of each region and selecting the most representative
value for the region. Figure 5 shows a smoothed image of an
industrial area in Washington, D.C. Figure 6 shows a
segmentation obtained by a recursive histogram splitting
technique 12 , and Figure 7 shows a refined S2 disparity
map 13 .
Experiments have shown that such region-based
representations of a scene provide useful models for the
refinement of height estimates. Using the fusion approach, it
becomes possible to segment the scene into building regions
based solely on disparity information. (Figure 8 shows the
building regions extracted by such an approach). Other
sources of information could be utilized at the refinement
stage to further enhance disparity data. Left/right
consistency constraints could augment the fusion model, as
well as more sophisticated models of image radiometry and
surface structure. The region-based refinement approach
could also be used to refine scene segmentations (such as
those produced by feature extraction systems). In summary,
the information fusion and refinement paradigm provides a
framework for the enhancement of height estimates and
allows for the incorporation of possibly inaccurate or
inconsistent information in a robust manner.