Full text: Papers accepted on the basis of peer-reviewed full manuscripts (Part A)

In: Paparoditis N., PieiTot-Deseilligny M.. Mallet C.. Tournaire О. (Eds). lAPRS. Vol. XXXVIII. Part ЗА - Saint-Mandé, France. September 1-3. 2010 
Ш. 
overall 
building 
non-buil. 
Graz, pixel level 
88.5 
90.5 
87.3 
Graz, with super-pixel 
90.6 
92.1 
90.0 
Graz, with CRF 
93.7 
92.1 
93.4 
San Fran., pixel level 
85.7 
86.3 
85.3 
San Fran., with super-pixel 
89.2 
89.0 
91.8 
San Fran., with CRF 
92.1 
91.8 
93.4 
Table 1: Building classification accuracy in terms of correctly 
classified pixels on hand-labeled orthographic test data. It can be 
clearly seen that use of super-pixels as spatial support improves 
accuracy. The CRF stage further improves the classification rates 
using a consistent final labeling of the super-pixels. 
et al.. 2006) and the derived DTM (Champion and Boldo. 2006). 
A combination of DTM and DSM yields absolute elevation mea 
surements per pixel from ground which are applied for the build 
ing classification and modeling. 
Building Classification. For all datasets, we train individual 
RF classifiers with 8 trees and a maximum depth of 14. The 
Sigma Points feature vectors are collected within small image 
patches (11 x 11 pixels). In this work the Sigma Points describe 
the statistics of feature cues like color, texture and elevation mea 
surements within small image patches. Texture information is 
directly obtained by computing first order derivatives on the L 
channel of CIELab color images. A combination of the color 
channels, two gradients and the elevation measurements yields a 
feature vector with 78 attributes, which can be directly trained 
and evaluated using the RF classifiers. 
In our approach we exploit hand-labeled ground truth maps for 
training of the classifiers. Please note that the labeling of train 
ing data involves some human interaction, but since our approach 
works at the pixel level there is no need to accurately label com 
plete building areas. Hence the labeling of the training data is 
straightforward and can be efficiently done by applying brush 
strokes representing either building or non-building class. For 
evaluation we additionally label randomly selected orthographic 
images (we use 9 tiles per dataset). Obtained classification rates 
are summarized in Table 1. We report both the overall per-pixel 
classification rate (i.e. the accuracy of all pixels correctly clas 
sified) and the average of class specific per-pixel percentages, 
which gives a more significant measurement due to varying quan 
tity of labeled pixels for each class. On both datasets we obtain 
overall classification rates of more than 90%. A classification 
of a single aerial image at full resolution takes approximately 3 
minutes on a dual core machine. 
The fusion step for color, height and classification, also including 
the super-pixel generation, of 6 different viewpoints covering an 
area of 150 x 150 meters lasts less than 5 minutes. Quickshift is 
applied to a vector consisting of pixel location and CIELab color. 
The parameters for Quickshift are set to a = 2 and t — 8. It 
turned out that these parameters capture nearly all object bound 
aries in some observed test images. In addition, the parameters 
generates sufficiently small regions in order to preserve curved 
boundary shapes. The overall results, adding a CRF stage for 
classification refinement are given for ui = 3.0. 
Figure 3 shows a result for a fused image tile of Graz. While 
the raw pixel-wise fusion of the class probabilities shows higher 
granularity and blurred object boundaries due to inaccurate 3D 
information (compare to Figure 2), an integration of super-pixels 
and CRF improves the final building classification significantly. 
Building Modeling. We use the proposed method to model com 
plex rooftops of buildings in 3D. Figure 3 shows a modeling re 
sult for a part of Graz. In order to obtain a quantitative evaluation 
Figure 3: Results for a small part of Graz. The first row depicts 
the input sources like color, elevation measurements and refined 
classification, aggregated within super-pixels. The second row 
shows computed super-pixels overlaid with the building mask and 
the result of the refinement step which groups super-pixels by 
taking into account the geometric primitives. In the bottom the 
corresponding constructed 3D building model is given. 
the root mean squared error (RMSE) over all building pixel is 
computed between fused DSM values and the heights obtained 
by 3D modeling. For Graz we obtain an RMSE of 1.9 meters 
taking into account all 170.0e6 building pixels. For San Fran 
cisco the RMSE is 1.7 meters evaluated on 210.0e6 pixels. In 
case of prototype refinement the parameter uz controls the fidelity 
between the degree of details and geometric simplification. For 
both datasets the smoothing factor with u) = 5.0 has given reli 
able results. 
In Figure 4 computed 3D models are shown for San Francisco 
and Graz. For efficiency and large-scale capability we compute 
such models in tiles of 1600 x 1600 pixels. Given the fused color 
including super-pixel segmentation, height and classification im 
ages, the 3D model of Graz can be computed within an hour using 
a subsequent processing. 
7 CONCLUSION 
We have proposed an efficient, purely image-driven approach for 
constructing synthetic 3D models of buildings by exploiting re 
dundant color and height information. First, an efficient classifi 
cation at the pixel level has been introduced to separate buildings 
from the background. A pixel-wise fusion step integrates differ 
ent modalities from multiple viewpoints into a common ortho 
graphic view. In particular, involving a super-pixel segmentation 
enables a generic modeling of any building rooftop shape and re 
duces the problem of outliers and computational complexity. We 
237
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.