International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B4, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
the classification spectral, textural and structural features are
used.
3.1.1. Feature extraction Information about vegetation is
contained in the bands of multispectral images and in features
derived from them (Ruiz et al., 2004; Hall et al., 2008; Itzerott
and Kaden, 2007). Similar to the cited works, we use the
median value of a local neighbourhood of each channel for the
classification. Furthermore, the variance is used as an additional
feature in the classification process. The dimension Nye. of the
feature vector for the spectral features X,pec per band is two.
Textural features derived from of the grey level co-occurrence
matrix (GLCM)) can give important hints to separate different
agricultural classes (Haralick et al., 1973; Rengers und Prinz,
2009). We use eight Haralick features energy, entropy,
correlation, difference moment, inertia (contrast), cluster shade,
cluster prominence, and Haralick correlation (Haralick et al.,
1973) in our classification approach. Using three directions, the
dimension N;, of the feature for the textural features X, is 24
per band.
In addition, structural features can give an important hint for the
classification of the agricultural classes cropland and grassland
(Helmholz, 2010), whereas the usefulness of these features
mainly depends on the resolution of the available images. While
parallel straight lines caused by agricultural machines are
visible in cropland GIS-objects in images with a higher
resolution, these lines vanish in images with a lower resolution.
However, because these structures can give an important hint in
order to separate cropland and grassland and because our
algorithm should have the possibility to be easily applied to
other image data (such as high resolution images), we use
structural features. The structural features are derived from a
semi-variogram. Features derived from a semi-variogram have
been successfully used for the classification of different
agricultural areas in (Balaguer et al., 2010). In total 14 features
(Balaguer et al., 2010) are used in the classification process.
The dimension N,,, of the feature vector of the structural
features Xstrue Per band is 14.
3.1.2. Feature vector The feature vector of a pixel Xgq per
band is build by concatenation of the feature vectors with
T T T T
X feat = (X spec) X tex: X struc) (1)
the dimension Ny, is
Nfeat = Niece + Nx + Now: = 40 (2)
To maintain flexibility with respect to various image acquisition
systems and sensors, respectively, an arbitrary number of input
channels N,, is supported. All input channels are sub-sampled
equally by a factor leading to an image pyramid for every
channel, whereas N,,, is the number of pyramid levels. Each
pyramid level is handled equally. The set is passed on to the
feature extraction module, where spectral, textural and
structural features are calculated for each pixel within an N, x N;
neighbourhood as described before. Features extracted at the
same pixel position build up one feature vector Xfeat_total-
The dimension djear rorai Of the feature vector Xçeat_totat 1:
pat total m Non t Nes t Nfeat (3)
For instance, using multispectral and multi-temporal
information of one five band image and two three-band images
simultaneously, the number of bands is N., = 11. Assuming N,,,
= 2 resolution levels, we get 22 bands for which a feature vector
with Nj, = 40 has to be calculated. Thus, the dimension of the
feature vector is djear tort = 880.
66
3.2 Pixel-based Classification
The classification is carried out using SVM. The SVM classifier
is a supervised learning method used for classification and
regression. Given a set of training examples, each marked as
belonging to one of two classes, SVM training builds a model
that predicts whether a new example falls into one class or the
other. The two classes are separated by a hyperplane in feature
space so that the distance of the nearest training sample from
the hyperplane is maximised; hence, SVM belong to the class of
max-margin classifiers (Vapnik, 1998). Since most classes are
not linearly separable in feature space, a feature space mapping
is applied: the original feature space is mapped into another
space of higher dimension so that in the transformed feature
space, the classes become linearly separable. Both training and
classification basically require the computation of inner
products of the form d(x;)' . ®(x;), where x; and x; are feature
vectors of two samples in the original feature space and ®(x;)
and ®(x;) are the transformed features. These inner products can
be replaced by a Kernel function K(x;, xj), which means that the
actual feature space mapping €» is never explicitly applied
(Kernel Trick). In our application we use the Gaussian Kernel
K(x; x;) = exp(-y Il x; — xjP). The concept of SVM has been
expanded to allow for outliers in the training data to avoid over-
fitting. This requires a parameter V that corresponds to the
fraction of training points considered to be outliers.
Furthermore, classical SVM only can separate two classes, and
SVM do not scale well to a multi-class problem. A common
way to tackle this problem is the one-versus-one-strategy
(Chang and Lin, 2001) where all combinations of classes C are
tested against each other. In total C(C-1)/2 combinations are
calculated. The pixel is assigned to the class with the most wins
(winner-takes-it-all-strategy).
For our approach, the SVM algorithm needs to learn the
properties of the different classes. These are not only the classes
cropland and grassland but classes which describe a typical site,
e.g. settlement, industrial area and forest. The training is done
using a set of image patches with known class labels. The image
patches and the class labels are assigned to the training objects
interactively by a human operator. Each feature is normalised so
that its value is between 0 and 1. Then, all feature vectors are
used to train the SVM classifier required for the one-versus-one
strategy. The result of the classification process is a labeled
map, which represents the class membership for each pixel.
3.3 Transfer the pixel-based classification results to GIS-
objects
After the pixel-based classification was utilised, the result for all
pixels inside an object must be transferred to a GIS-object. This
is done using the approach from Busch et al. (2004). Pixels that
match the class of the GIS object are considered as correct,
while pixels belonging to another class are considered to be
incorrect. Two criteria are chosen for the assessment of the GIS
objects. First, the ratio q of incorrect pixels in relation to all
pixels that cover the object are calculated using
q = incorrect / (correct + incorrect) (4)
If q is larger than the pre-defined threshold #,, the GIS object
will be labelled as rejected/incorrect. The threshold t, depends
on how many incorrect pixels are likely to appear in a cropland
or grassland object.
As incorrect pixels can be distributed more or less equally in the
object due to noise or inhomogeneously textured regions, a
second criteria considering the compactness of an error region
mM f en ("C79 Cy e: pm hn