CMRT09: Object Extraction for 3D City Models, Road Databases and Traffic Monitoring - Concepts, Algorithms, and Evaluation
128
(a) Prior Building Models (<£>i j): i determines the shape of the footprint
and j the roof type
(b) The family Iq j which has a rectangular footprint (i = 1).
(c) Building’s main height h m and roofs height
h r (x,y)
Figure 2: Hierarchical Grammar-Based 3D Prior Models. The
case of Building Modeling: Building’s footprint is determined
implicitly from the Eid- h m and h r (x,y) are recovered for ev
ery point {E$d) and thus all the different type of roofs j are mod
eled.
most appropriate model and then determine the optimal set of
parameters aiming to recover scene’s geometry (Figure 1). The
proposed objective function consists of two segmentation terms
that guide the selection of the most appropriate typology and a
third DEM-driven term which is being conditioned on the typol
ogy. Such a prior-based recognition process can segment both
rural and urban regions (similarly to (Matei et al., 2008)) but is
able, as well, to overcome detection errors caused by the mislead
ing low-level information (like shadows or occlusions), which is
a common scenario in remote sensing data.
Our goal was to develop a single generic framework (with no
step-by-step procedures) that is able to efficiently account for
multiple 3D building extraction, no matter if their number or
shape is a priori familiar or not. In addition, since usually for
most sites multiple aerial images are missing, our goal was to
provide a solution even with the minimum available data, like a
single panchromatic image and an elevation map (produced either
with classical photogrammetric multi-view stereo techniques ei
ther from LIDAR or INSAR sensors), contrary to approaches that
were designed to process multiple aerial images or multispectral
information and cadastral maps (like in (Suveg and Vosselman,
2004),(Rottensteiner et al., 2007),(Sohn and Dowman, 2007)),
data which much ease scene’s classification. Doing multiview
stereo, using simple geometric representations like 3D lines and
planes or merging data from ground sensors was not our interest
here. Moreover, contrary to (Zebedin et al., 2008), the proposed,
here, variational framework does not require as an input dense
height data, dense image matching processes and a priori given
3D line segments or a rough segmentation.
2 MODELING TERRAIN OBJECTS WITH 3D PRIORS
Numerous 3D model-based approaches have been proposed in lit
erature. Statistical approaches (Paragios et al., 2005), aim to de
scribe variations between the different prior models by measuring
the distribution of the parameter space. These models are capable
to model building with rather repeating structure and of limited
complexity. In order to overcome this limitation, methods using
generic, parametric, polyhedral and structural models have been
considered (Jaynes et al., 2003),(Kim and Nevatia, 2004),(Su
veg and Vosselman, 2004),(Dick et al., 2004),(Wilczkowiak et
al., 2005),(Forlani et al., 2006),(Lafarge et al., 2007). The main
strength of these models is their expressional power in terms of
complex architectures. On the other hand, inference between the
models and observations is rather challenging due to the impor
tant dimension of the search space. Consequently, these models
can only be considered in a small number. More recently, proce
dural modeling of architectures was introduced and vision-based
reconstruction in (Muller et al., 2007) using mostly facade views.
Such a method recovers 3D using an L-system grammar (Muller
et al., 2006) that is a powerful and elegant tool for content cre
ation. Despite the promising potentials of such an approach, one
can claim that the inferential step that involves the derivation of
models parameters is still a challenging problem, especially when
the grammar is related with the building detection procedure.
Hierarchical representations are a natural selection to address com
plexity while at the same time recover representations of accept
able resolution. Focusing on buildings, our models involve two
components, the type of footprint and the type of roof (Figure 2).
Firstly, we structure our prior models space by ascribing the
same pointer i to all models that belong to the family with the
same footprint. Thus, all buildings that can be modeled with a
rectangular footprint are having the same index value i. Then,
for every family (i.e. every i) the different types of building tops
(roofs) are modeled by the pointer j (Figure 2b) Under this hierar
chy <E>i,j, the priors database can model from simple to very com
plex building types and can be easily enriched with more complex
structures. Such a formulation is desirously generic but forms a
huge search space. Therefore, appropriate attention is to be paid
when structuring the search step.
Given the set of footprint priors, we assume that the observed
building is a homographic transformation of the footprint. Given,
the variation of the expressiveness of the grammar, and the de
grees of freedom of the transformation, we can now focus on the
3D aspect of the model. In such a context, only building’s main
height hm and building’s roof height h r (x, y) at every point need
to be recovered. The proposed typology for such a task is shown
in Figure 2. It refers to the rectangular case but all the other
families can respectively be defined. More complex footprints,
with usually more than one roof types, are decomposed to sim
pler parts which can, therefore, similarly recovered. Given an im
age J(x, y) at domain (bounded) il E i? 2 and an elevation map
7i(x, y) -which can be seen both as an image or as a triangulated
point cloud- let us denote by h rn the main building's height and
by P m the horizontal building’s plane at that height. We proceed
by modeling all building roofs (flat, shed, gable, etc.) as a combi
nation of four inclined planes. We denote by Pi, P2, P3 and P4
these four roof planes and by , U2, u>3 and u>4, respectively, the
four angles between the horizontal plane h m and each inclined
plane (Figure 2). Every point in the roof rests strictly on one of
these inclined planes and its distance with the horizontal plane is
the minimum compared with the ones formed by the other three
planes.
With such a grammar-based description the five unknown param
eters to be recovered are: the main height h m (which has a con
stant value for every building) and the four angles u. In this way
all -but two- types of buildings tops/roofs can be modeled. For
example, if all angles are different we have a totally dissymmetric
roof (Figure 2b - $1.5), if two opposite angle are zero we have a