2
Decision fusion can be defined as a strategy to join information
from different data sources, after each individual source has
been classified previously. Support Vector Machines (SVM) are
another recent development in the field of remote sensing
(Huang et al., 2002, Foody and Mathur, 2004, Melgani and
Bruzzone, 2004), which is well known in machine learning and
pattern recognition.
SVM differentiate between two classes by fitting an optimal
separating hyperplane to the training samples in the multi
dimensional feature space (Vapnik, 1998). For classes that are
not linearly separable, the input data are mapped into a higher
dimensional space by a kernel function.
In several experiments, the SVM achieved accurate
classification accuracies and even performed well when applied
to high-dimensional imagery or multisource data sets. Song et
al. (2005) applied a “conventional” SVM to a multisource data
set, consisting of multispectral images and topographical
information. In other studies, the kernel functions were
modified for classifying dissimilar data sources (Halldorsson et
al., 2003; Camps-Valls et al., 2006). In contrast to this Fauvel
et al. (2006) used two individual SVM classifiers to combine
different information from hyperspectral imagery. Two SVM
were applied separately on the original image and on a data set,
containing spatial information (i.e., extended morphological
profiles). Afterwards the outputs were combined by different
voting schemes.
These experimental results clearly demonstrate that the
accuracy for multisource classifications can be further increased
by using modified kernel functions or separate SVM classifiers.
Hence, it could be appropriate to handle the different image
sources separately when classifying multisensor data sets
including multi temporal SAR and multispectral imagery.
In Waske et al. (2007) individual SVM classifiers are trained on
two separate data set (i.e., multitemporal SAR and
multispectral). The generated pre-classifications are combined
by decision fusion to create the final result. Beside different
voting concepts, as a majority voting and the absolute
maximum, fusion is performed by an additional SVM. The
proposed SVM-based fusion strategy outperforms all other
concepts and improves the results of a single SVM that is
trained on the whole multisource data set.
In the following experiment this fusion concept is applied to
multisource imagery from an agricultural region. The
classification results are compared to results of other algorithms
as maximum likelihood classifier, decision tree, boosted
decision tree and common SVM.
2. CLASSIFCATION STRATGIES
2.1 Support Vector Machines
For a binary classification problem in a d-dimensional feature
space, Xi e<R d ,i = \,2,...,L denotes a training data set of L
samples with their corresponding class labels y i e {l,—l}. The
hyperplane f(x) is defined by the normal vector and the bias,
denoted by we 9^ and 6e9I respectively, where |6|/||w|| is
the distance between the hyperplane and the origin, with ||w|| as
the Euclidean norm from w.
f(x) = w-x + b (1)
The margin maximization can be defined as the following
optimization problem:
( L
/(*) =
'Yj a l y i k{ x i,x j )+b
V '=1
(2)
with as slack variables that are introduced to deal with
misclassified samples in non-separable cases. The constant C is
added as a penalty for cases which are located on the wrong
side of the hyperplane and it controls the shape of the
classification boundary. Therefore, it directly affects the
generalization capability of the SVM.
Using the so-called kernel methods, the above linear SVM
approach has been extended for non-linear separable cases. For
non-linear problems, the data is transformed into a higher
dimensional Hilbert space, using a non-linear mapping <Z>. The
kemel-trick enables to work within the newly transformed
feature space, without knowing explicitly O, but only the kernel
function:
SAR data
/
ZJ
1 /
Figure 1. Sch
W*. ))=*(*«,*;)
The final decision function can be written as:
( L
I
V i=l J
/(*) =
'Y j a l y l k{x i ,x j )+b
where a¡ denotes Lagrange multipliers.
(3)
(4)
In our study, a Gaussian radial basis function is used (Vapnik,
1998):
( \
Il il 2 *
[Xi,Xj )= exp
i
1
H
1
i
The output of the decision function f(x) provides the distance of
each pixel to the separating hyperplane, giving a rule image,
which is used to determine the final classification result.
The SVM was originally designed as a binary classifier. Hence
different concepts have been proposed to solve multi-class
(«-class) problems. In the one-against-all (OAA) approach, a set
of classifiers is generated to separate each class from the others,
e.g., forest from the rest, urban from the rest, etc. The maximum
distance value within the n rule images determines the final
class membership. For the one-against-one (OAO) strategy, a
set of n(n-l)/2 individual SVM is trained, one for each pair
wise classification problem, e.g., forest vs. urban. As done in
the OAA approach, each individual SVM classifier provides a
rule image. Contrary to the OAA approach, each pixel is
assigned to the class getting the highest number of votes.
2.2 Decision Fusion Strategy
The strategy for combining the two different sensor sources is
presented in Figure 1. On each image source a set of SVMs is
trained individually and corresponding rule images are
generated. The information from the rule images is fused by
applying an additional SVM on a data set, containing all rule
images.
3. D/
The multisource
multitemporal S.
2005. The SAJ
polarization anc
polarizations (T£
Figure 2. S
The nearly flat
region is domir
typical spatial j
of planted cro]
conducted in s
used as referen
data sets.