The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B7. Beijing 2008
286
nonlinear methods, and the kernel methods provide a new
approach to the feature extraction. Some research scholars have
studied the feature extraction methods of hyperspectral based
on kernel function, such as kernel principal components
analysis (KPCA) and kernel Bhattacharyya feature extraction
(KBFE) (Lu, 2005).
In the feature space H , the Fisher discriminant function can be
defined as
J x (w) =
where w is a nonzero vector.
(4)
In 2000, the Generalized Discriminant Analysis (GDA) was
brought forward by Baudat (Baudat et al., 2000), which is the
nonlinear extraction of Linear Discriminant Analysis, has been
successfully used in face recognition (Gao et al., 2004) and
mechanical failure classification (Li, 2003). In this paper, we
first introduced the mathematical model and the solution of the
GDA, applied this method to extract features from the
hyperspectral image. Then we made experiments with two
groups of the hyperspectral images which were obtained by
different kinds of hyperspectral imaging system. At last the
result was analyzed. The main contents were described in detail
as follow.
2. GENERALIZED DISCRIMINANT ANALYSIS
Through mapping samples from the input space to the feature
space with high dimensions, we carry on the liner methods of
feature extraction in this feature space. Because of the
dimension in the feature space is very large, and it may be
infinitude, in order to avoid deal with the samples
perceptibly ,we use the kernel functions to compute the inner
product in the feature space.
2.1 Theory of Feature Extraction Based on GDA
Suppose there are C classes of samples, which are belong
to co x ,co 2 ,L , co m , and the original sample x has n dimensions,
so x e R". If we map the sample x to feature space H with
higher dimensions by the mapping tj) , in the feature space,
x will be </>(x) G H .If all the samples are mapped to the future
space H , the intraclasses scatter matrix S^, , the interclasses
scatter matrix and the total scatter matrix Sf of the training
samples, will be described as follows:
st)-»?)(#*; )' in
A ¿=1 7=1
sf =-f (3)
A J= i
where N t is the amount of training samples belonging to the
class co i , N is the amount of all the training samples. In the
feature space H , (f>{:c') is the sample j ( j = 1,L N i ) of
class / (i = 1,L , C ), <p(Xj) is the sample j (j = 1,L ,N) of
all the samples, mf = E{tf>(x) | co ( .} is the mean of samples in
c
the class/, m* = ^P(a> i )mf is the mean of all the samples.
¡=1
Sf, S* and Sf are all nonnegative matrixes.
In the feature space H , Generalized Discriminant Analysis
(GDA) is to find a group of discriminant vectors (w x ,L w d ),
which can maximize the Fisher discriminant function (4), and
all the vectors are orthogonal.
wjwj = 0, Vi * j;i,j = 1,L ,d
The first discriminant vector w x of GDA is also the fisher
discriminant vector, which is the eigenvector corresponding to
maximal eigenvalue of eigenfunction Sfw = ^S^w .If we
know the first r discriminant vectors w x , L ,w r , the
r +1 discriminant vector H> r+I can be gotten through resolving
the follow optimization problem.
max(J,(H’))
Model I: <
w T j w = 0, j = 1,L , r
(5)
we H
According to the theory of the reproducing kernel Hilbert space,
the eigenvectors are linear combinations of H elements, so
w can be expressed as
N
w = J^a'<p(x i ) = </> a (6)
i=i
where (f> = (tf)(xf),L ,(/)(x N )) , a - («’ ,L ,a N ) T , a is
optimal kernel discriminant vector, which can map the sample
<t>(x) in the feature space to the direction w
w T <j){x) = w T <f> T (f){x) = a T £ x (7)
where = (k(x l ,x),k(x 2 ,x),L ,k(x N ,x)) T .For the
sample x e R n , % x is the kernel sample vector which relates to
x x , x 2 ,L , x N , so the kernel matrix is
K = (^ X2 ,L ,<?„)
In the feature space H , the mean of each classes and the mean
of all the samples can also be mapped to the direction w
II
5
1 ‘
—Yf(xi)
V,tr
T
=a n,
(8)
w T m* = a T (f)'
it«*
=a T p 0
(9)
where
(10)
Pi =
,)■№)]
* =1
A; k=1
/
Po ~
(11)
According to the Equation (8),(10)and (11),there are
w T S( w - a T K h a
(12)
w T Slw = a T K w a
(13)