In: Paparoditis N., Pierrot-Deseilligny M.. Mallet C.. Tournaire O. (Eds), IAPRS. Vol. XXXVIII. Part ЗА - Saint-Mandé, France. September 1-3, 2010
The combination process was implemented in several stages as
4.1 Filtering of lidar point clouds
First the original lidar point clouds were filtered to separate on-
terrain points from points falling onto natural and human made
objects. A filtering technique based on a linear first-order
equation which describes a tilted plane surface has been used
(Salah et al.. 2009). Data from both the first and the last pulse
echoes w’ere used in order to obtain denser terrain data and
hence a more accurate filtering process. After that, the filtered
lidar points were converted into an image DTM. and the DSM
was generated from the original lidar point clouds. Then, the
nDSM was generated by subtracting the DTM from the DSM.
Finally, a height threshold of 3m was applied to the nDSM to
eliminating other objects such as cars to ensure that they are not
included in the final classified image.
4.2 Generation of Attributes
Our experiments were carried out characterizing each pixel by a
32-element feature vector which comprises: 25 generated
attributes, 3 image bands (R. G and B), intensity image, DTM,
DSM and nDSM. The 25 attributes include those derived from
the Grey-Level Co-occurrence Matrix (GLCM), Normalized
Difference Vegetation Indices (NDV1), slope and the
polymorphic texture strength based on the Forstner operator
(Fbrstner and Gulch, 1987). The NDV1 values for the UNSW,
Bathurst and Fairfield test areas were derived from the red
image and the lidar reflectance values, since the radiation
emitted by the lidars is in the 1R wavelengths. The resolutions
of the lidar reflectance data for these study areas are lower than
that for the images, and this may impact on the ability to detect
vegetation. Since the images derived for the Memmingen
dataset include an IR channel, the NDV1 was derived from the
image data only. The attributes were calculated for pixels as
input data for the three classifiers. Table 3 shows the attributes
and the images for which they have been derived. These
attributes have been selected to be uncorrelated based on the
problem of correlation between feature attributes. All the
presented attributes were used for every test area. A detailed
description of the filtering and generation of attributes process
can be found in Salah et al. (2009).
Table 3. The full set of the possible attributes from aerial
images and lidar data. V and x indicate whether or not
the attribute has been generated for the image. PTS
refers to polymorphic texture strength; HMGT refers
to GLCM/homogeneity; Mean refers to GLCM/
Mean; entropy refers to GLCM/ entropy.
4.3 Land Cover Classification
In this work, we have used the Self-Organizing Map (SOM),
Classification Trees (CTs), and Support Vector Machines
(SVMs) classifiers to estimate the class memberships required
for the combination process.
Support Vector Machines (SVMs)
SVMs are based on the principles of statistical learning theory
(Vapnik. 1979). SVMs delineate two classes by fitting an
optimal separating hyperplane (OSH) to those training samples
that describe the edges of the class distribution. As a
consequence they generalize w^ell and often outperform other
algorithms in terms of classification accuracies. Furthermore,
the misclassification errors are minimized by maximizing the
margin between the data points and the decision boundary.
Since the One-Against-One (1A1) technique usually results in a
larger number of binary SVMs and then in subsequently
intensive computations, the One-Against-All (1AA) technique
was used to solve for the binary classification problem that
exists with the SVMs and to handle the multi-class problems in
aerial and lidar data. The Gaussian radial basis function (RBF)
kernel has been used, since it has proved to be effective with
reasonable processing times in remote sensing applications.
Two parameters should be specified while using RBF kernels:
• C, the penalty parameter that controls the trade-off
between the maximization of the margin between the
training data vectors and the decision boundary plus the
penalization of training errors
• y, the width of the kernel function.
In order to estimate these values and to avoid making
exhaustive parameter searches by approximations or heuristics,
we used a grid-search on C and y using a 10-fold cross-
validation. The original output of a SVM represents the
distances of each pixel to the optimal separating hyperplane,
referred to as rule images. All positive (+1) and negative (-1)
votes for a specific class were summed and the final class
membership of a certain pixel was derived by a simple majority
Self-Organizing Map Classifier (SOM)
The SOM undertakes both unsupervised and supervised
classification of imagery using Kohonen’s SOM neural network
(Kohonen, 2001). SOM requires no assumption regarding the
statistical distribution of the input pattern classes and has two
important properties: the ability to learn from input data; and to
generalize and predict unseen patterns based on the data source,
rather than on any particular a priori model. In this work (Salah
et al., 2009), the SOM has 32 input neurons which are: 25
generated attributes, 3 image bands (R, G and B), intensity
image, DTM, DSM and nDSM. The output layer of an SOM
was organized as a 15 x 15 array of neurons as an output for the
SOM (255 neurons). This number was selected because, as
recommended by Hugo et al. (2006), small networks result in
some unrepresented classes in the final labelled network, while
large networks lead to an improvement in the overall
classification accuracy. Initial synaptic weights between the
output and input neurons were randomly assigned (0-1). In the
output of the SOM, each pixel is associated with a degree of
membership for a certain class.
Classification Trees (CTs)
The theory of Classification trees (CTs) (also called decision
trees) was developed by Breiman et al. (1984). A CT is a non-
parametric univariate technique built through a process known
as binary recursive partitioning. This is an iterative procedure in
which a heterogeneous set of training data consisting of