The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B7. Beijing 2008
287
w T Sjw = a T K t a (14)
where
< 15 >
< l7 >
where K h is the kernel interclasses scatter matrix, K u is the
kernel intraclasses scatter matrix, and K t is the total scatter
matrix. All of three matrixes are nonnegative matrixes, and their
sizes are NxN.
From Equation (12) and (13), Fisher discriminant function (4)
can be expressed as
J» =
a T K h a
a T K w a
(18)
where a is a nonzero vector. The orthogonal constraint
condition can be expressed as
w] Wj = a] <f> T fydj = a]Kdj = 0, Vz * j; i, j = 1,L , d
So, the Model I can be expressed by kernel matrixes as
max(J 1 , (a))
Model II
a Ka = 0,j = 1,L ,r
(19)
aeR 7
That is to say that, if we know the first T discriminant
vectors a, ,L ,a r , the r + 1 discriminant vector a r+{ can be got
through resolving the above optimization problem, a, is the
eigenvector corresponding to the maximal eigenvalue of
eigenfunction K h a = AK w a . If {a,,a 2 ,L ,a d } is from the
Model Hand {rv^w^L ,w d } is from Model /, the relationship
between them is
h>,. ='£ j a *0(x k ) = 0a i ,i = 1,L ,d (20)
k=1
where </> = (^(.x, ),L ,<f>(x N j) .
In Baudat’s literature (Baudat et al., 2000), instead of./,(H>),
they used J 2 (w)
r , w> T S* w
J 2^)=
w S“w
Correspondingly, the Model I of GDA can be rewritten as
max(J 2 (H’))
Model/: \ w^.n> = 0,j = 1,L ,r
w&H
and the Model //of GDA can be rewritten as
max^i«))
Model II: \
a r j Ka = 0,y = 1,L ,r
a g R n
(21)
(22)
For Model I with J 2 («) , if we have known the first
r (r > 1) discriminant vectors, the « r+1 can be gotten by
resolving the following eigenfunction.
rK b a r+1 = AK t a r+l (23)
where f = /- KA 7 (AKK~' KA T ) AKK~ ] , / is an identity
matrix. A = (a, ,« 2 ,L ,a r ) T .Because w is an identity vector in
Model /, W 7 W = a' Ka = 1 .If a has been known, a should
be standardized by dividing yja 7 Ka j .
In the feature space H , if a group of discriminant vectors
[h> p m> 2 ,L ,»v d } have been known, for the sample tj>(x) , its
discriminant feature is
w,<t>(x) = Y, a i <!>(x k )(f>(x) = Y j a-k{x k ,x) = aj$ x (24)
k=1 k=l
where is kernel vector of the input sample x .
The transformation function of GDA is
y = W T <fcx) = [w Xi w v L , Wjfftx)
= [a,,a 2 >L , a d ] T £ x
where y is the feature extracted by GDA which has
d dimensions.
2.2 Kernel Function
Basing on the theory of kernel function, once a kernel function
k(x,y) accords with Mercer theorem, then it corresponds to
a inner product kernel function, mapping function and feature
space in a certain space. In fact, to change kernel parameter is
to implicitly change mapping function in order to change the
complexity of distribution in sample sub-space. There are three
kinds of kernel that are usually used.
(1) Dimensional polynomial kernel of degree d
k(x,y) = [(x-y) + p] d
where p and d are custom parameters. If /7 = 0 and d = 1, it
will be called linear kernel function.
(2) Radial basis function (RBF) kernel
f
k(x,y) = ex p
V
<7
/
where <7 2 > 0.
(3) Neural Network kernel function
k(x, j) = tanh(//(x: • j) + v)
where p and v are parameters. Different from polynomial
kernel and RBF kernel, the neural network kernel accords with
the Mercer theorem only when (p, v) are certain values.
2.3 Flow of Feature Extraction based on GDA
According to Baudat’s literature (Baudat et al., 2000), we select
J 2 {w) as the Fisher discriminant function, through the analysis
above, the steps of feature extraction based on generalized
discriminant are described as follows.
(1) Select the kernel function &(•,•) and its parameters, and
the amount d of the feature will be extracted.