Full text: Papers accepted on the basis of peer-reviewed full manuscripts (Part A)

In: Paparoditis N., Pierrot-Deseilligny M., Mallet C.. Tournaire O. (Eds). IAPRS. Vol. XXXVIII. Part ЗА - Saint-Mandé, France. Septentber 1-3. 2010 
235 
Figure 2: Fusion result: The first row shows six redundant ortho 
graphic views of a scene taken from Graz. Fused image results 
are given in the second row for color, height and building classi 
fication. Undefined areas are considerably compensated by using 
the high redundancy. 
the set of Sigma Points 1 as follows: 
Po 
= fj> Pi = д + a(vE)i pi +d =//- a(v/E)i, (1) 
where i — 1... d and (x/E), defines the z-th column of the re 
quired matrix square root. Due to symmetry of the covariance 
matrix, we apply the Cholesky factorization to efficiently com 
pute the matrix square root of E. The term a defines a weighting 
for the elements in the covariance matrix and is set to ct = \/2d 
as suggested in (Kluckner et al.. 2009). Then, a resulting region 
descriptor P = {po pad} consists of 2d + 1 concatenated 
Sigma Points p, £ R' 1 and has a dimension of P £ 
For details we refer to (Kluckner et al.. 2009). The next section 
describes the fusion of redundant information into a common 3D 
coordinate system. 
4 FUSION OF MULTIPLE IMAGES 
Because of the high overlap in the aerial imagery, each point 
on ground is mapped multiple times from different viewpoints. 
Since we are interested in large-scale modeling, we generate an 
orthographic image from many overlapping perspective images 
by a pixel-wise transformation into a common 3D coordinate sys 
tem. Taking into account camera data and depth information, pro 
vided by a dense matching procedure, corresponding pixels in the 
perspective images yield multiple observations for color, height 
and building classification in the orthographic view. Several rec 
tified observations of a scene taken from the imagery Graz are 
shown in Figure 2. 
The fusion of redundant information into a common view has the 
benefit that e.g. reconstruction errors caused by non-stationary 
objects like moving cars can be compensated. In addition, a pro 
jection of many different views produces an orthographic image 
without undefined image regions caused by perspective occlu 
sions. First, color and height information are fused by computing 
median values for each pixel from multiple observations. In case 
of robustly fusing color information per pixel, we use random 
projections of the color vector onto ID lines to detect the me 
dian of vector-valued data (Tukey, 1974). Though simple mean 
computation has lower computational complexity, a median will 
not introduce new colors values as possibly introduced by aver 
aging. In addition, an accurate fused color image is essential for 
super-pixel segmentation performed at the next step. In order to 
estimate a final building likelihood for each pixel in the ortho 
graphic view, confidences from different views are accumulated 
1 Code available at http://www.icg.tugraz.at/Members/kluckner 
and normalized. Figure 2 depicts the final pixel-wise fusion result 
for color, height and building classification. In the next step we 
briefly discuss super-pixels and introduce an optimization stage 
to refine the classification and the prototype labeling on a super 
pixel neighborhood. 
4.1 Super-Pixel Segmentation 
A variety of recently proposed methods obtaining state-of-the-art 
performance on benchmark datasets integrate unsupervised im 
age segmentation methods into classification or object detection. 
Several approaches utilize multiple segmentations (Malisiewicz 
and Efros. 2007, Pantofaru et al., 2008) however the generation 
of many partitions induces enormous computational complexity 
and is impractical for aerial image segmentation. Recently. Fulk 
erson et al. (Fulkerson et al., 2009) proposed to use super-pixels, 
rapidly generated by Quiekshift (Vedaldi and Soatto, 2008). These 
super-pixels accurately preserve object boundaries of natural and 
man-made objects. Applying Quiekshift super-pixel segmenta 
tion to our approach offers several benefits: First, computed super 
pixels can be seen as the smallest units in the image space. All 
subsequent processing steps can be performed on a reduced ad 
jacency graph instead of incorporating the full pixel image grid. 
Furthermore, we consider super-pixels like homogeneous regions 
providing important spatial support: Due to edge preserving ca 
pability, each super-pixel describes a part of only one class, namely 
building or non-building. Aggregating data, such as classifica 
tion and height information, over the pixels defining a super-pixel 
compensates for outliers and erroneous pixels. For instance, an 
accumulation of building likelihoods results an improved build 
ing classification for each segment. A color averaging within 
small regions synthesizes the final modeling results and signif 
icantly reduces the amount of data. More importantly, we ex 
ploit super-pixels, which define parts of the building footprints, 
for the 3D modeling procedure. Taking into account a derived 
polygon approximating of the boundary pixels and corresponding 
height information, classified building footprints can be extruded 
to form any type of geometric 3D primitives. Therefore, intro 
ducing super-pixels for footprint description allows to model any 
kind of ground plan and in the following the rooftop. 
4.2 Refined Labeling using Super-Pixels 
Although aggregating the fused building classification or extract 
ing geometric prototypes using super-pixels capture some local 
information, the regions in the image space are handled inde 
pendently. In order to incorporate spatial dependencies between 
nodes defined on the image grid, e.g. Markov random field for 
mulations (Boykov et al., 2001) are widely used to enforce an 
evident final class labeling. In contrast to minimizing the energy 
on a full image grid (Pantofaru et al., 2008, Kluckner et al., 2009) 
we apply a conditional Markov random field (CRF) stage defined 
on the super-pixel neighborhoods similar as proposed in (Fulk 
erson et al.. 2009). In our approach we apply the refinement on 
super-pixels twice: First, we apply the CRF to provide a smooth 
labeling of the building class taking into account the spatial de 
pendency on an adjacency graph. Second, in a separate process 
ing step the CRF is used for consistent labeling of the geometric 
prototypes to enforce a piecewise planar rooftop. 
Let G(S, E) be an adjacency graph with a super-pixel node s, £ 
S and a pair (Si,Sj) £ E be an edge between the segments s l and 
Sj. then an energy can be defined with respect to the class labels 
c. In this work a label can be a building/non-building class or a 
possible assignment to a specific geometric primitive. Generally,
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.