XXII ISPRS Congress 2012: Technical Commission VII

2 MATHEMATICAL FOUNDATIONS 
Within this section, the mathematical foundations of the main 
methods used will be given. A high number of profound intro- 
ductions into the SVM problem (Burges, 1998), (Ivanciuc, 2007), 
(Zhang, 2001), (Schólkopf and Smola, 2002), (Camps- Valls and 
Bruzzone, 2009) and valuable reviews on the application of SVM 
in remote sensing (Mountrakis et al., 2010), (Plaza et al., 2009) 
have been published. For this reason, the foundations of SVM 
and state-of-the-art application examples will not be given ex- 
haustively, but strictly focused on the kernel-composition prob- 
lem. 
2.1 Kernel matrices and the SVM problem 
Given a data set X with n data points, kernel matrices are the re- 
sult of kernel functions applied over all n° tupels of data (Shawe- 
Taylor and Cristianini, 2004). The outcome of a kernel function 
Kg, =; = fs(xi, x5) is a similarity measure for the two training 
data x; and =; depending on some distance metric 9. Usually, 
ó is the Euclidean distance (Mercier and Lennon, 2003). How- 
ever, kernel functions can be modified by e.g. by introducing 
different similarity measures (Amari and Wu, 1999). For in- 
stance, (Mercier and Lennon, 2003) and (Honeine and Richard, 
2010) use the spectral angle as a similarity measure for hyper- 
spectral SVM classification. To model complex distributions of 
the training data in the feature space, f; is usually some non- 
linear function. The most frequently applied family of non-linear 
functions are Gaussian radial basis functions (RBF) (Schólkopf et 
al., 1997). The closer two points are found in the feature space, 
the higher is their resulting kernel value. Given these facts, the 
kernel matrix simply represents the similarity between the points 
of the training data set. To understand how the kernel matrix is 
used in SVM classification, it is helpful not to look at the primal, 
but the dual formulation of the SVM problem (Ivanciuc, 2007). 
The dual problem is given by Eq.1 
maximize : 3 == 3 3 Y Admis K (ai, 5) (1) 
i io 4 
À; 20 V Support Vectors 
A = 0 V other points 
The Lagrange multipliers A; are only greater than zero for the 
support vectors. These are usually identified by sequential mini- 
mal optimization (Platt, 1998). Hence, only training data which 
are both SVs contribute to the solution of Eq.1 (for all other cases, 
AiÀj - O setting the second part of Eq.1 to zero). The class 
labels y; are in [-1, 1]. Since the second part of Eq.1 is sub- 
tracted, only points with different class labels can maximize the 
term (their product y;y; — —1 renders the second part positive). 
The problem is therefore maximized, if points are chosen as SVs 
which have different class labels but are found close to each other 
in the feature space (thus yielding a high value in the kernel ma- 
trix K(x;,x;)). Thus, the similarity values of the kernel matrix 
are used for finding the best suited training points as SVs. By set- 
ting the A; of all other points to zero, a sparse solution is found 
which only depends on the SVs. 
2.2 Kernel-Composition 
As can be seen in Eq.1, the training data z; do not enter di- 
rectly into the SVM problem. In contrast, the data are represented 
by kernels K (zi, rj). According to Mercer's theorem (Mercer, 
1909), valid kernel functions can be combined, e.g. through ad- 
dition, to form new valid kernels. From there, different sources 
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B7, 2012 
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia 
   
of information on the same training data can be fused through 
simple arithmetical operations (Camps-Valls et al., 2006b). For 
instance, Kc(z£,z$) — Ka(zi,zj) + Kp(aP,x}) fuses 
the information domains A and B on the training data zi, x; 
and forms a new kernel Kc. Within the original framework on 
kernel-composition for data fusion (Camps- Valls et al., 2006b), 
the following fusion approaches are published. 
Kc(af a5) = Kalmé, 0°) t Ka(zj x7) 
(2) 
Kolaf 05) = pKalat, 05) + (1 - n)Ka(aË, =?) 
(3) 
Kc (af , a£) = Ka + Ka + Kap(z; af) + Kpa(z? 2j) 
(4) 
Eq.2 is called direct summation kernel, the most simple form 
of kernel-composition. Eq.3 is called weighted summation ker- 
nel. Its main advantage is, that the weighting parameter u € 
(0, 1) allows to regulate the relevance of the two data sources A 
and B for the classification problem. Eq.4 is called the cross- 
information kernel. It consist of four single kernels while the 
last two K ap and Kp 4 allow incorporating the mutual informa- 
tion between the data sources A and B (e.g. differences between 
the value of both data sources yielded for a particular data point). 
Based on these basic composition approaches (Camps- Valls et al., 
2008), (Camps- Valls et al., 20062) extend the kernel-composition 
framework to the field of multitemporal classification and change 
detection. The key idea is to use images from the same land- 
scape but from different points in time as input data for kernel- 
composition and SVM classification. Given two points in time 
t0 and £1 two kernels Ko and Ky; are built. These kernels only 
incorporate the spectral information given at each point of time. 
Then, a new kernel can be build using one of the Eqs.2 to 4. For 
instance, Kchange (27,27) = Kw(z®,2P) + Kn (2,2) 
represents a direct summation kernel which incorporates the in- 
formation about the change of the spectral responses of pixels 
implicitly. Although the basic composite kernels can be used for 
multitemporal classification as well, the authors developed spe- 
cialized kernels in order to combine traditional change detection 
techniques with kernel-composition. For instance, the image dif- 
ference kernel is introduced in Eq.5. 
Ke (af, 27) = Ky (221, a3!) + Kio (x, zl") = 
. ‚Ka wc ar) Kio. (2f, gil) (5) 
Note that Eq.5 is a particular case of the cross-information ker- 
nel (Eq.4) that performs the change detection technique of image 
differencing in the RKHS. 
3 RELATED WORK 
Within this section, an overview on relevant contribution from the 
field of change detection and kernel-composition will be given. 
Since kernel-composition has been introduced only in 2006, it has 
been dedicated far less research than change detection in general. 
3.4 Change detection and multitemporal classification 
Herein, a short outline on important reviews and state-of-the-art 
papers in change detection is presented. A very comprehensive 
introduction into multitemporal classification is given by (Gillan- 
ders et al., 2008). (Singh, 1989) and (Coppin et al., 2004) present 
reviews with emphasis on signal processing. (Wang and Xu, 
2010) give a comparison on change detection methods empha- 
sising particular aspects of different applications. (Holmgreen 
and Thuresson, 1998) and (Wulder et al., 2006) present reviews
1
2
...
289
290
291
292
293
...
560
561
Full text: Technical Commission VII (B7)

Access restriction

Copyright

Note to user