In: Stilla U, Rottensteiner F, Paparoditis N (Eds) CMRT09. IAPRS, Vol. XXXVIII, Part 3/W4 — Paris, France, 3-4 September, 2009
COMPETING 3D PRIORS FOR OBJECT EXTRACTION IN REMOTE SENSING DATA
Konstantinos Karantzalos and Nikos Paragios
Ecole Centrale de Paris
Grande Voie des Vignes, 92295
Chatenay-Malabry, France
{konstantinos.karantzalos, nikos.paragios} @ecp.fr
http://users.ntua.gr/karank/Demos.html
Commission III
KEY WORDS: Computer Vision, Pattern Recognition, Variational Methods, Model-Based, Evaluation, Voxel-Based
ABSTRACT:
A recognition-driven variational framework was developed for automatic three dimensional object extraction from remote sensing data.
The essence of the approach is to allow multiple 3D priors to compete towards recovering terrain objects’ position and 3D geometry.
We are not relying, only, on the results of an unconstrained evolving surface but we are forcing our output segments to inherit their 3D
shape from our prior models. Thus, instead of evolving an arbitrary surface we evolve the selected geometric shapes. The developed
algorithm was tested for the task of 3D building extraction and the performed pixel- and voxel-based quantitative evaluation demonstrate
the potentials of the proposed approach.
1 INTRODUCTION
Although, current remote sensing sensors can provide an updated
and detailed source of information related to terrain analysis, the
lack of automated operational procedures regarding their process
ing impedes their full exploitation. By using standard techniques
based, mainly, on spectral properties, only the lower resolution
earth observation data can be effectively classified. Recent auto
mated approaches are not, yet, functional and mature enough for
supporting massive processing on multiple scenes of high- and
very high resolution data.
On the other hand, modeling urban and peri-urban environments
with engineering precision, enables people and organizations in
volved in the planning, design, construction and operations life-
cycle, in making collective decisions in the areas of urban plan
ning, economic development, emergency planning, and security.
In particular, the emergence of applications like games, naviga
tion, e-commerce, spatial planning and monitoring of urban de
velopment has made the creation and manipulation of 3D city
models quite valuable, especially at large scale.
In this perspective, optimizing the automatic information extrac
tion of terrain features/objects from new generation satellite data
is of major importance. For more than a decade now, research
efforts are based on the use of a single image, stereopairs, multi
ple images, digital elevation models (DEMs) or a combination of
them. One can find in the literature several model-free or model-
based algorithms towards 2D and 3D object extraction and recon
struction [ (Hu et al., 2003),(Baltsavias, 2004),(Suveg and Vossel-
man, 2004),(Paparoditis et al., 2006),(Drauschke et al., 2006),(Rot-
tensteiner et al., 2007),(Sohn and Dowman, 2007),(Verma et al.,
2006),(Lafarge et al., 2007),(Karantzalos and Paragios, 2009) and
the references therein]. Despite this intensive research, we are,
still, far from the goal of the initially envisioned fully automatic
and accurate reconstruction systems (Brenner, 2005),(Zhu and
Kanade (Eds.), July, 2008),(Mayer, 2008). Processing remote
sensing data, still, poses several challenges.
In this paper, we extend our recent 2D prior-based formulations
(Karantzalos and Paragios, 2009) aiming at tackling the prob
lem of automatically and accurately extracting 3D terrain objects
(a) Satellite Image (b) Ground Truth
(c) DEM (d) Extracted 3D Buildings
(e) Reconstructed Scene
Figure 1: 3D Building Extraction through Competing 3D Priors
from optical and height data. Multiple 3D competing priors are
considered transforming reconstruction to a labeling and an esti
mation problem. In such a context, we fuse images and DEMs
towards recovering a 3D prior model. We are experimenting with
buildings but, similarly, any other terrain object can be modeled.
Our formulation allows data with the higher spatial resolution to
constrain properly the footprint detection in order to achieve the
optimal spatial accuracy (Figure 1). Therefore, we are proposing
a variational functional that encodes a fruitful synergy between
observations and multiple 3D grammar-based models. Our mod
els refer to a grammar, which consists of typologies of 3D shape
priors (Figure 2). In such a context, firstly one has to select the
127