In: Paparoditis N., Pierrot-Deseilligny M.. Mallet C.. Tournaire O. (Eds). IAPRS. Vol. XXXVIII. Part ЗА - Saint-Mandé, France. September 1-3, 2010
240
2. PROCESSING CHAIN
In this section we provide an overview of the proposed
processing chain (Fig. 1). It can roughly be subdivided into five
steps: 1) line extraction. 2) projection of all lines to a reference
coordinate system. 3) extraction of features, 4) training of the
CRF parameters using ground truth, and 5) classification into
building and non-building sites. The output is a label image
showing building and non-building sites.
First, 3D lines are computed from the optical stereo images
(section 3.2) and double-bounce lines are segmented in the
InSAR data (section 3.3). Both line sets are then projected from
the sensors' coordinate systems to the reference coordinate
system of the orthophoto. Thereafter, a feature vector is
computed for each site. In our case, an image site corresponds
to a square image patch as traditionally used for both computer
vision (e.g., Kumar and Hebert. 2003) and remote sensing
applications of CRFs (e.g., Zhong and Wang, 2007). In
addition, we adapt the idea of Kumar and Hebert (2006) and
compute those features in three different scales. Then, the
parameters of the CRF are trained on a subset of the data using
ground truth. Subsequently, inference is conducted and the test
data are classified into building sites and non-building sites (see
CRF details in section 4).
3. FEATURES
Usually, high-resolution multi-spectral orthophotos are widely
available and thus we take an orthophoto as the basic source of
features for building detection. In order to assess the impact of
height data on the building detection results of the CRF
framework we also investigate optical stereo imagery. In very
high-resolution aerial imagery characteristic objects of urban
areas, particularly buildings, become visible in great detail (Fig.
2(a)). High-resolution SAR data provides complementary
information. Double-bounce lines occurring at the position
where the building wall meets the ground are characteristic
Figure 1. Flowchart of the processing chain for building
detection
features (Thiele et al., 2010). Fig. 3(a) compares the sensor
geometries and the projected lines in ground geometry.
Disregarding all projection artefacts, the double-bounce line of
a flat-roofed building (with vertical walls) is located at the same
position as the stereo line representing the roof edge (neglecting
overhang). Note that the roof segment of the building in the
orthophoto we use falls over double-bounce line and stereo line
since we are not dealing with a true orthophoto (cf. Fig. 3(b,c)).
The focus of this research is neither on particularly
sophisticated features nor on sophisticated feature selection
techniques but on the overall suitability assessment of CRFs for
building detection with multi-sensor data. Therefore, rather
simple features are selected and feature selection is
accomplished empirically.
3.1 Orthophoto features
We test various combinations of features (colour, intensity, and
gradient) of the orthophoto within the CRF framework and
choose those that provide the best results. The most suitable
features are found based on colour, intensity, and gradient. As
colour features we take mean and standard deviation of red and
green channel normalized by the length of the RGB vector.
Mean and standard deviation of the hue channel are found to be
discriminative, too. Furthermore, variance and skewness of the
gradient orientation histogram of a patch proved to be good
features. The images are subdivided into square image patches
and features are calculated within each patch. Of course, the
choice of patch size is a trade-off. A small patch size is
desirable in order to detect buildings in detail. However, too
small patches lead to instable features resulting in less reliable
estimates of the probability density distributions. We apply a
multi-scale approach to mitigate those shortcomings (Kumar
and Hebert, 2006). Each feature is calculated for different patch
sizes and all scales are integrated into the same feature vector.
We follow this approach and test various numbers of scales and
scale combinations. Three different scales (10x10, 15x15, and
20x20 pixels) are found to provide good results. Features of
large patches integrate over bigger areas thus excluding, for
example forests or agricultural areas whereas the small patches
provide details.
3.2 Stereo lines
We extract 3D lines from a pair of aerial images using the pair
wise line matching approach proposed by Ok et al. (2010). At
this point we only briefly summarize the algorithm and refer the
reader to the reference for further details. The entire algorithm
consists of four main steps: pre-processing, straight line
extraction, stereo matching of line pairs, and post-processing.
Pre-processing contains smoothing with a multi-level non-linear
colour diffusion filter and colour boosting in order to
exaggerate colour differences in each image. Next, straight lines
are extracted in each of the stereo images. A colour Canny edge
detector is applied to the pre-processed images. Thereafter,
straight edge segments are extracted from the edge images using
principal component analysis followed by random sampling
consensus. Subsequently, a new pair-wise stereo line matching
technique is applied to establish the line to line correspondences
between the stereo images. The pair matches are assigned after a
weighted matching similarity score, which is computed over a
total of eight measures.