Proceedings of the Symposium on Global and Environmental Monitoring: Proceedings of the Symposium on Global and Environmental Monitoring

142 
Recent work has addressed the problem of fusing the 
information provided by analysis systems such as BABE, 
SHADE, SHAVE, and GROUPER into more complete and 
accurate sets of building hypotheses. A monocular fusion 
system has been developed that merges the geometric 
building boundaries into composite boundaries. Figure 4 
shows the fusion result for the Washington, D.C. scene. 
Performance analysis indicates that the percentage of 
building pixels identified correctly in this scene for SHADE, 
SHAVE and GROUPER is 37.5%, 47.2%, and 48.7%, 
respectively. The results of monocular fusion, however, 
improve the overall building pixel classification rate to 
77.7%. We have performed fusion analysis over a test 
database of 8 images and have observed similar performance 
improvements 10, . 
2.1 Further Issues in Monocular Fusion 
The problems of building structure hypothesis integration 
are amenable to the techniques of information fusion. Recent 
work in this area has shown that simple image processing 
techniques can integrate complementary sources of building 
hypothesis data to provide improved detection of man-made 
objects in aerial imagery 11 . These building fusion methods 
provide a simple and effective means for increasing the 
building detection rate for a scene. There remain, however, 
other issues to be addressed in the fusion of monocular scene 
analysis systems. 
The fusion techniques produce qualitatively accurate 
building delineations, in the sense that few buildings are 
overlooked in the extraction process; quantitative 
delineations, however, are typically poor due to the 
accumulation of delineation errors in the data produced by 
the monocular analysis subsystems. While applications such 
as flight simulation are not necessarily adversely affected by 
this problem, cartographic applications may require very 
accurate delineations of man-made structures. We would 
like to address this issue by examining the interactions 
between data during the information fusion process. For 
example, during the fusion process, an estimate of "building 
density" can be produced; that is, an estimate of the 
likelihood of building structure for each pixel in an image. 
These likelihood estimates could be used to refine the fused 
building delineations. In addition, the fusion process 
provides a composite set of building boundaries. Various 
subsets of these boundaries could be combined and evaluated 
with respect to image gradient, shadow casting, and disparity 
measures to produce improved boundaries. Another 
approach would use these boundaries as intermediate results 
to be analyzed by each of the component monocular analysis 
systems. These systems could refine their initial building 
estimates in accordance with the fused boundaries, and then 
these refined estimates could themselves be integrated to 
produce improved fusions. This idea suggests an iterative 
process in which building fusions are used to refine their 
components until some quality measure is exceeded. 
If multiple views of a scene are available, either from 
different vantage points, or from sensors with different 
spectral characteristics, then more information is available 
for an information fusion process. Under the cooperative- 
methods paradigm, monocular analysis techniques could be 
applied to multiple images to produce initial building 
detection data, which could then be integrated by information 
fusion techniques to produce improved building detection 
rates for the scene. Although multiple sets of data may not 
always be available, their presence requires the development 
and/or extension of fusion techniques to take advantage of 
the additional information such data provides. 
3 Improvement and Interpretation of 3D Scene 
Analysis 
The main goal of many vision systems, and in particular 
automated cartographic feature extraction systems, is to 
recover the three-dimensional structure of a scene. Stereo 
analysis is a common technique used for this problem, and 
much work has been done in this area. Nevertheless, the 
difficulty of the task (in terms of registration and stereo 
matching) limits the success of any single technique. To 
obtain complete and accurate height estimates for a scene, it 
seems clear that we will need to utilize more knowledge than 
that used by any stereo matching process. Some work is 
being done on the interpolation of sparse disparity 
information to generate reliable and dense height 
information, and work is being done on post-processing of 
stereo matching results to ensure consistency of the derived 
height estimates. 
Many three-dimensional estimation improvement 
approaches use external knowledge about the scene, in the 
form of models that constrain certain aspects of scene 
structure. An interesting approach would he the use of 
information from multiple sources to achieve consistent 
disparity results, by applying a simple information fusion 
model. Such an approach allows for the integration of 
incomplete and inconsistent information to refine height 
estimates. 
Initial work under this paradigm used a region-based 
interpretation model for the information fusion and 
refinement process 7 . The assumption of this model is that 
uniform image radiometry is produced by planar surfaces, of 
specific orientations and materials. Under this assumption, 
the segmentation of the monocular images into fine surface 
patches of nearly homogeneous intensity will ideally result in 
a segmentation delineating planar surfaces in the scene. 
Having obtained segmentations of the images in which 
regions of nearly homogeneous intensity are delineated, it 
becomes possible to refine initial height estimates for the 
scene. Since each region is assumed to correspond to a 
planar surface in the scene, we can assign disparity values to 
each region to produce an initial refinement of the height 
estimates. This is done by histogramming the disparity 
values of each region and selecting the most representative 
value for the region. Figure 5 shows a smoothed image of an 
industrial area in Washington, D.C. Figure 6 shows a 
segmentation obtained by a recursive histogram splitting 
technique 12 , and Figure 7 shows a refined S2 disparity 
map 13 . 
Experiments have shown that such region-based 
representations of a scene provide useful models for the 
refinement of height estimates. Using the fusion approach, it 
becomes possible to segment the scene into building regions 
based solely on disparity information. (Figure 8 shows the 
building regions extracted by such an approach). Other 
sources of information could be utilized at the refinement 
stage to further enhance disparity data. Left/right 
consistency constraints could augment the fusion model, as 
well as more sophisticated models of image radiometry and 
surface structure. The region-based refinement approach 
could also be used to refine scene segmentations (such as 
those produced by feature extraction systems). In summary, 
the information fusion and refinement paradigm provides a 
framework for the enhancement of height estimates and 
allows for the incorporation of possibly inaccurate or 
inconsistent information in a robust manner.
1
2
...
160
161
162
163
164
...
951
952
Full text: Proceedings of the Symposium on Global and Environmental Monitoring (Pt. 1)

Access restriction

Copyright

Note to user