In: Stilla U, Rottensteiner F, Paparoditis N (Eds) CMRT09. IAPRS, Vol. XXXVIII, Part 3/W4 — Paris, France, 3-4 September, 2009
129
* I, | s, :i
‘ V*| I ■ '
»♦ % I. 1 *: ;
'w .v r
(a) Detected Buildings
/-
r ^ ^ •-
Jv *1 l'rfv?
i -Ti
W*’ • * .T^T
4 ^ “ • **— % %
(b) Ground Truth
■■I
‘ v VI : * .*
' ♦*! *•* ' ■^JT
V */ .N. r
(c) Horizontal True Positives (HTP)
t; ■;%
s?
i:
V
r
♦♦
(d) Horizontal False Positives (HFP)
«. / f f •• \ 4
(e) Horizontal False Negatives (HFN)
Figure 3: Horizontal Qualitative Evaluation: The recognition-driven process efficiently detects, in an unsupervised manner, scene
buildings and recovers their 3D geometry.
gable-type one ($1.4) and if all are zero we have a flat one ($1.1).
The platform and the gambrel roof types can not be modeled but
can be easily derived in cases where the fit energy metric is as
sumed on local minima. The platform one (l>i, 2 ), for instance,
is the case where all angles have been recovered with small val
ues and a search around their intersection point will estimate the
dimensions of the rectangular-shape box above main roof plane
Pm. With the aforementioned formulations, instead of searching
for the best among ixj (e.g. 5x6 = 30) models, their hierarchical
grammar and the appropriate defined energy terms (detailed in the
following section) are able to cut down effectively the solutions
space.
from the grouping criteria. The simplest possible approach would
involve the Mumford-Shah approach that aims at separating the
means between the two classes. Above equation can be straight
forwardly extended in order to deal with other optical or radar
data like for example in cases where multi- or hyper-spectral re
mote sensing data are available.
Furthermore, instead of relying only on the results of an uncon
strained evolving surface, we are forcing our output segments to
inherit their 2D shape from our prior models. Thus, instead of
evolving an arbitrary surface we evolve selected geometric shapes
and the 2D prior-based segmentation energy term takes the fol
lowing form:
3 MULTIPLE 3D PRIORS IN COMPETITION
EXTRACTING MULTIPLE OBJECTS
Let us consider an image (X) and the corresponding digital eleva
tion map (Pi). In such a context, one has to separate the desired
for extraction objects from the background (natural scene) and,
then, determine their geometry. The first segmentation task is ad
dressed through the deformation of a initial surface —► 1Z +
that aims at separating the natural components of the scene from
the man-made parts. Assuming that one can establish correspon
dences between the pixels of the image and the ones of the DEM,
the segmentation can be solved in both spaces through the use
of regional statistics. In the visible image we would expect that
buildings are different from the natural components of the scene.
In the DEM, one would expect that man-made structures will ex
hibit elevation differences from their surroundings. Following
the formulations of (Karantzalos and Paragios, 2009), these two
assumptions can be used to define the following segmentation
function
Eseg (0)
J |Vc/>(x)| dx
+
+ P
J
Jq
H € (4>) r obj (J(x))
[ H e {<f>) r obj (7T(x))
Jci
+ [1 - He(</>)] r bg (T(x)) dx
+ [1 - He(4>)\ r b g (Pi(x)) dx
(1)
where H is the Heaviside, r ob j and r bg are object and background
positive monotonically decreasing data-driven functions driven
E2d{4>, Ti, L)
He(<f>{x)) — H e (4>i (Ti(x)))
Xi(L(x))d,x +
J \ 2 Xm(L(x))dx + p'^2 J IVL(x)|dx
i=l
(2)
with the two parameters A, p > 0 and the A>dimensional label
ing formulation able for the dynamic labeling of up to m = 2 k
regions.
In this way, during optimization the number of selected regions
m = 2 k depends on the number of the possible building segments
according to <j> and thus the (¿-dimensional labeling function L
obtains incrementally multiple instances. It should be, also, men
tioned that the initial pose of the priors are not known. Such a
formulation E seg + E2D allows data with the higher spatial res
olution to constrain properly the footprint detection in order to
achieve the optimal spatial accuracy. Furthermore, it solves seg
mentation simultaneously in both spaces (image and DEM) and
addresses fusion in a natural manner.
3.1 Grammar-based Object Reconstruction
In order to determine the 3D geometry of the buildings, one has
to estimate the height of the structure with respect to the ground
and the orientation angles of the roof components i.e. five un
known parameters: the building’s main height hm which is has
a constant value for every building and the four angles u> of the