In: Paparoditis N., Pierrot-Deseilligny M.. Mallet C.. Tournaire O. (Eds). IAPRS. Vol. XXXV1I1. Part ЗА - Saint-Mandé, France. September 1-3. 2010
239
URBAN BUILDING DETECTION FROM OPTICAL AND INSAR FEATURES
EXPLOITING CONTEXT
J. D. Wegner“'*, A. O. Ok b , A. Thiele c , F. Rottensteiner a , U. Soergel a
ll Institute of Photogrammetry and Geoinformation (IPI). Leibniz Universität Hannover, Hannover. Germany-
(wegner, soergel)@ipi.uni-hannover.de
b Dept, of Geodetic and Geographic Information Technologies, Middle East Technical University, Ankara. Turkey -
oozgun@metu.edu.tr
c Fraunhofer Institute of Optronics. System Technologies and Image Exploitation (IOSB), Ettlingen. Germany -
an tj e. th i el e@.i osb. fraunh o fer. de
Commission III, WG 111/4
KEY WORDS: Conditional Random Fields, Remote Sensing. Fusion. InSAR Data. Optical Stereo Data, Urban Area
ABSTRACT:
We investigate the potential of combined features of aerial images and high-resolution interferometric SAR (InSAR) data for
building detection in urban areas. It is shown that completeness and correctness may be increased if we integrate both InSAR
double-bounce lines and 3D lines of stereo data in addition to building hints of a single optical orthophoto. In order to exploit
context information, which is crucial for object detection in urban areas, we use a Conditional Random Field approach. It proves to
be a valuable method for context-based building detection with multi-sensor features.
1. INTRODUCTION
Building detection in urban areas based on merely a single
aerial photo is often hard to conduct (Mueller and Zaum, 2005).
Features of additional data sources may be introduced to
improve detection completeness and correctness. In addition to
features derived from an orthophoto we use building hints of
high-resolution InSAR data and an optical stereo image pair.
Several works have already dealt with the integration of features
derived from high-resolution optical and SAR (or InSAR) data
with the goal of building detection. Xiao et al. (1998) detect and
reconstruct building blocks combining high-resolution optical
and InSAR data. They classify both data sets separately within a
multi-layer neural network followed by morphological
operations. Finally, rectangles are fit to building hypothesis and
heights are derived. Hepner et al. (1998) jointly use hyper-
spectral imagery and InSAR data acquired by airborne sensors
to detect and three-dimensionally reconstruct large buildings in
urban areas. Tupin and Roux (2003) propose an approach to
extract footprints of large flat-roofed industrial buildings based
on line features. In (Tupin and Roux, 2005) the same authors
represent homogeneous regions of an aerial photo with a region
adjacency graph. This graph is then used within a Markov
Random Field framework to regularize building heights
determined by means of radargrammetry. A discontinuity
constraint based on the image gradient along segment
boundaries is introduced into the prior term in order to preserve
sudden height jumps. Poulain et al. (2009) combine high-
resolution optical and SAR data with vector data in order to
detect changes. Since no learning step is conducted all
classification is performed based on prior knowledge. They
generate features from previously extracted primitives and set
up a score for each building site using Dempster-Shafer
evidential theory. Sportouche et al. (2009) detect and three-
dimensionally reconstruct large industrial buildings semi-
automatical ly. They combine features of high-resolution optical
satellite imagery (Quickbird) with high-resolution SAR data
(TerraSAR-X). Building hypothesis of the optical data are
validated or rejected based on a classification of the SAR image
making use of roof textures, bright lines, and shadows. Building
heights are derived simultaneously exploiting the different
optical and SAR sensor geometries. We recently proposed a
segment-based approach for building detection (Wegner et al.,
2009). Segments of an orthophoto are classified in combination
with InSAR double-bounce lines.
In this paper, we use a Conditional Random Field (CRF)
framework, which is a probabilistic contextual classification
framework originally introduced by Lafferty et al. (2001) for
labelling 1D sequential data and later on extended to images by
Kumar and Hebert (2003). CRFs have already been successfully
applied to various computer vision tasks (e.g., Rabinovich et al.,
2007; Korc and Forstner, 2008). Nonetheless, CRFs have only
rarely been applied to remote sensing data (Zhong and Wang,
2007). Furthermore, to the authors knowledge only one
publication exploits CRFs for the analysis of SAR data (He et
al., 2008).
Our focus is on the suitability of CRFs for combining multi
sensor remote sensing data using context with the aim of single
building detection. Although much more sophisticated features
could potentially be derived from stereo and InSAR data we use
rather simple ones in order to transparently assess the entire
framework. More sophisticated features may then be introduced
in future work.
We now first give an overview of the entire processing chain.
Then, features we utilize are explained, the basic theory of
CRFs is described, and finally building detection results with
different feature sets as input are compared.
Corresponding author