In: Paparoditis N., Pierrot-Deseilligny M.. Mallet C.. Tournaire O. (Eds), 1APRS. Vol. XXXVIII. Part ЗА - Saint-Mandé, France. September 1-3. 2010
the energy can be defined as
E(c\G) = У' D(gj|ct) + u> V(si,Sj\d,Cj), (2)
Si€S ( Si , Sj )eE
where D(si\cj) expresses the unary potential of a super-pixel
node. In case of the classification refinement, c represents a bi
nary labeling of the adjacency graph that assigns each graph node
Si a label c, G {building, non-building}. The unary potential
D(si\ci) = — log(H(si)) denotes the class likelihoods H(s L ) of
a super-pixel s t obtained by aggregating pixel-wise confidences.
The costs for geometric primitive refinement are described in the
next section. The factor u controls the influence of the regulariza
tion and is estimated by using cross validation. In order to con
sider the region sizes in the minimization process, we compute
the pairwise edge term Sj\(k, c,) between the super-pixels
Si and S j with
V(s t , c 7 ) =
b(Sj,8j)
l + g{sii Sj
-S(d Ф Cj).
(3)
The function b (,Si, Sj) computes the number of common bound
ary pixels of two given segments, g{si, Sj) is the L 2 norm of the
mean color distance vector and 5(-) is a simple zero-one indi
cator function. In this work we minimize the energy defined in
Equation 2 by using «-expansion moves (Boykov et al., 2001).
Please note that the number of exemplars has not to be given in
advance and the similarity matrix can be computed sparsely. We
therefore construct the similarity matrix as follows: For each 3D
primitive which consists of plane and a 3D point in space, we
estimate the reconstruction error for adjacent super-pixels tak
ing into account the current prototype hypothesis and the set of
neighboring height data points. Considering only adjacent image
regions additionally reduces computational costs for construct
ing the similarity matrix. The clustering procedure yields a set
of representative primitive prototypes which are used to approx
imate a rooftop shape w'ith respect to the available height infor
mation. Next, we reuse the formulation of the energy defined in
Eq. 2 to obtain a consistent prototype labeling for building re
gions. In case of geometric primitive refinement, c represents
a labeling of the adjacency graph that assigns each super-pixel
Si a label a G T, where T is the set of geometric prototypes
obtained by clustering. Similar as proposed in (Zebedin et al.,
2008), the unary potential D(si\ci) denotes the costs, in terms of
summed point-to-plane distance measurements, of s t being as
signed the label c* or prototype, respectively. We compute the
pairwise edge term considering appearance and super-pixel sizes
in order to obtain a smooth geometric prototype labeling within
homogeneous building areas. A refined labeling of prototypes is
shown in Figure 3.
5.3 Rooftop Modeling
5 BUILDING MODELING
A generation of super-pixels provides footprints for any object in
an observed color image. Taking into account the refined build
ing classification and additional height information, 3D geomet
ric primitives describing the smallest unit of a building rooftop
can be extracted as the next step. Estimated rooftop hypotheses
for each super-pixel in a building (we simply extract connected
components on the adjacency graph) are collected and clustered
in order to find representative rooftop prototypes. Finally, a CRF
optimization assigns consistently the prototypes to each super
pixel in a building considering resulting reconstruction error and
neighborhood segments.
5.1 Prototype Extraction
Assuming a set of super-pixels (a super-pixel can be seen as a list
of coordinates), classified as parts of an individual building, we
initially fit planes to the available corresponding point clouds pro
vided by the fused DSM. In this work we use planes as geometric
primitives however the prototype extraction can be extended to
any kind of primitives. We apply RANSAC over a fixed number
of iterations to find those plane, minimizing the distance to the
point cloud, for each building super-pixel. This procedure yields
a rooftop hypothesis for each super-pixel defined by a normal
vector and single point on the estimated plane (see second row of
Figure 3).
5.2 Prototype Clustering and Refinement
As a next step, we introduce a clustering of hypotheses for two
reasons: Since the subsequent optimization step can be seen as
a prototype labeling problem, similar 3D primitives should pro
vide same labels in order to result a smooth reconstruction of
a rooftop. Second, clustering significantly reduces the number
of probable labels which benefits the efficiency of the optimiza
tion procedure. We apply affinity propagation (Frey and Dueck,
2007) to find representative exemplars of 3D primitives. Affin
ity propagation takes as input a distance matrix of pairwise sim
ilarity measurements and efficiently identify a set of exemplars.
So far the footprint of each building consists of a set of super
pixels in the image space. In order to obtain a geometric foot
print modeling of each super-pixel, we first identify common
boundary pixels between adjacent building super-pixels. For each
super-pixel, this procedure results a specific set of boundary frag
ments, which can be individually approximated by straight line
segments. A pairwise matching of collected line segments yields
a closed yet simplified 2D polygon. Taking account of DTM and
the refined geometric primitive assignment, the footprint poly
gons defined by a number of vertexes are extruded to form small
units of a rooftop: distinctive 3D rooftop points are determined
by intersecting the plane (given by the geometric primitive) with
a line, directed to (0,0, l) r , going through the corresponding
vertex on ground. For the puipose of visualization, we use a 2D
Delaunay triangulation technique to generate the models of the
buildings. An individual 3D building model of our approach can
be seen as a collection of composed building super-pixels hav
ing identical building and rooftop prototype indexes, respectively.
A hierarchical grouping of super-pixels could be used to further
simplify the resulting building model.
6 EXPERIMENTS
This section evaluates our proposed framework on a large amount
of real world data. We first describe the aerial imagery, then the
building classification is evaluated on hand-labeled ground truth
data. Moreover, we present results of our building generalization
and perform quantitative and visual inspection of the constructed
models.
Data. We present results for two aerial imageries showing dif
ferent characteristics. The dataset Graz (155 images) shows a
colorful appearance with challenging buildings and San Fran
cisco (77 images) has suburban occurrence in a hilly terrain. The
imageries are taken with the Microsoft Ultracam in overlapping
strips (80% along-track overlap and 60% across-track overlap),
where each image has a resolution of 11500 x 7500 pixels with
a ground sampling distance of approximately 10 cm. We use the
color images, the height data computed by dense matching (Klaus