Full text: Technical Commission IV (B4)

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B4, 2012 
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia 
the classification spectral, textural and structural features are 
3.1.1. Feature extraction Information about vegetation is 
contained in the bands of multispectral images and in features 
derived from them (Ruiz et al., 2004; Hall et al., 2008; Itzerott 
and Kaden, 2007). Similar to the cited works, we use the 
median value of a local neighbourhood of each channel for the 
classification. Furthermore, the variance is used as an additional 
feature in the classification process. The dimension Nye. of the 
feature vector for the spectral features X,pec per band is two. 
Textural features derived from of the grey level co-occurrence 
matrix (GLCM)) can give important hints to separate different 
agricultural classes (Haralick et al., 1973; Rengers und Prinz, 
2009). We use eight Haralick features energy, entropy, 
correlation, difference moment, inertia (contrast), cluster shade, 
cluster prominence, and Haralick correlation (Haralick et al., 
1973) in our classification approach. Using three directions, the 
dimension N;, of the feature for the textural features X, is 24 
per band. 
In addition, structural features can give an important hint for the 
classification of the agricultural classes cropland and grassland 
(Helmholz, 2010), whereas the usefulness of these features 
mainly depends on the resolution of the available images. While 
parallel straight lines caused by agricultural machines are 
visible in cropland GIS-objects in images with a higher 
resolution, these lines vanish in images with a lower resolution. 
However, because these structures can give an important hint in 
order to separate cropland and grassland and because our 
algorithm should have the possibility to be easily applied to 
other image data (such as high resolution images), we use 
structural features. The structural features are derived from a 
semi-variogram. Features derived from a semi-variogram have 
been successfully used for the classification of different 
agricultural areas in (Balaguer et al., 2010). In total 14 features 
(Balaguer et al., 2010) are used in the classification process. 
The dimension N,,, of the feature vector of the structural 
features Xstrue Per band is 14. 
3.1.2. Feature vector The feature vector of a pixel Xgq per 
band is build by concatenation of the feature vectors with 
T T T T 
X feat = (X spec) X tex: X struc) (1) 
the dimension Ny, is 
Nfeat = Niece + Nx + Now: = 40 (2) 
To maintain flexibility with respect to various image acquisition 
systems and sensors, respectively, an arbitrary number of input 
channels N,, is supported. All input channels are sub-sampled 
equally by a factor leading to an image pyramid for every 
channel, whereas N,,, is the number of pyramid levels. Each 
pyramid level is handled equally. The set is passed on to the 
feature extraction module, where spectral, textural and 
structural features are calculated for each pixel within an N, x N; 
neighbourhood as described before. Features extracted at the 
same pixel position build up one feature vector Xfeat_total- 
The dimension djear rorai Of the feature vector Xçeat_totat 1: 
pat total m Non t Nes t Nfeat (3) 
For instance, using multispectral and multi-temporal 
information of one five band image and two three-band images 
simultaneously, the number of bands is N., = 11. Assuming N,,, 
= 2 resolution levels, we get 22 bands for which a feature vector 
with Nj, = 40 has to be calculated. Thus, the dimension of the 
feature vector is djear tort = 880. 
3.2 Pixel-based Classification 
The classification is carried out using SVM. The SVM classifier 
is a supervised learning method used for classification and 
regression. Given a set of training examples, each marked as 
belonging to one of two classes, SVM training builds a model 
that predicts whether a new example falls into one class or the 
other. The two classes are separated by a hyperplane in feature 
space so that the distance of the nearest training sample from 
the hyperplane is maximised; hence, SVM belong to the class of 
max-margin classifiers (Vapnik, 1998). Since most classes are 
not linearly separable in feature space, a feature space mapping 
is applied: the original feature space is mapped into another 
space of higher dimension so that in the transformed feature 
space, the classes become linearly separable. Both training and 
classification basically require the computation of inner 
products of the form d(x;)' . ®(x;), where x; and x; are feature 
vectors of two samples in the original feature space and ®(x;) 
and ®(x;) are the transformed features. These inner products can 
be replaced by a Kernel function K(x;, xj), which means that the 
actual feature space mapping €» is never explicitly applied 
(Kernel Trick). In our application we use the Gaussian Kernel 
K(x; x;) = exp(-y Il x; — xjP). The concept of SVM has been 
expanded to allow for outliers in the training data to avoid over- 
fitting. This requires a parameter V that corresponds to the 
fraction of training points considered to be outliers. 
Furthermore, classical SVM only can separate two classes, and 
SVM do not scale well to a multi-class problem. A common 
way to tackle this problem is the one-versus-one-strategy 
(Chang and Lin, 2001) where all combinations of classes C are 
tested against each other. In total C(C-1)/2 combinations are 
calculated. The pixel is assigned to the class with the most wins 
For our approach, the SVM algorithm needs to learn the 
properties of the different classes. These are not only the classes 
cropland and grassland but classes which describe a typical site, 
e.g. settlement, industrial area and forest. The training is done 
using a set of image patches with known class labels. The image 
patches and the class labels are assigned to the training objects 
interactively by a human operator. Each feature is normalised so 
that its value is between 0 and 1. Then, all feature vectors are 
used to train the SVM classifier required for the one-versus-one 
strategy. The result of the classification process is a labeled 
map, which represents the class membership for each pixel. 
3.3 Transfer the pixel-based classification results to GIS- 
After the pixel-based classification was utilised, the result for all 
pixels inside an object must be transferred to a GIS-object. This 
is done using the approach from Busch et al. (2004). Pixels that 
match the class of the GIS object are considered as correct, 
while pixels belonging to another class are considered to be 
incorrect. Two criteria are chosen for the assessment of the GIS 
objects. First, the ratio q of incorrect pixels in relation to all 
pixels that cover the object are calculated using 
q = incorrect / (correct + incorrect) (4) 
If q is larger than the pre-defined threshold #,, the GIS object 
will be labelled as rejected/incorrect. The threshold t, depends 
on how many incorrect pixels are likely to appear in a cropland 
or grassland object. 
As incorrect pixels can be distributed more or less equally in the 
object due to noise or inhomogeneously textured regions, a 
second criteria considering the compactness of an error region 
