The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B7. Beijing 2008
1296
Generally speaking, Yl is not larger than m. if so, principle
component analysis is used to reduce dimension from 171 to
Yl .Then, if we arrange both the observed variables V ( . and the
component variables S i into vectors respectively, a matrix form
of (1) can be given by
V = As (2)
Where,V = (v 1 ,V 2 ,V3---V m ) T ,5 = (5 1 ,5 2 ,5 3 ---5j T a
nd A is an unknown constant matrix, which is called the mix
matrix. Then we can define a demixing matrix W, which can be
given by:
y = Wv (3)
Our target is to estimate the independent components S, from
all observed signals alone, which is equivalent to find the
optimum estimate matrix W in (3), which makes y , the
estimate of variables s, as statistically independent as possible.
Because a linear mixture of Gaussian variables is still a
Gaussian variable, at most one source in the mixture model can
be allowed to be a Gaussian type.
In detail, the algorithm falls into three steps($h3§£, 2006). Firstly,
the preprocessing step is employed to whiten the mixing data in
order to remove the correlation between variables and reduce
data dimension (if necessary).Secondly, a subjective function
L(fV) using the demixing matrix W as argument is defined to
measure the independence of output variables y .Finally, a
optimization algorithm is used to find a optimal estimation
W ,which makes L(fV) has the extremum, while W is
equal to W . The implementation of ICA can be seen as a
combination of a objective function and a optimization
algorithm(Hy varinen, 1999). The key point of algorithm is the
definition of statistical impendence. In general, there are three
kinds of objective functions: the maximum of non-gaussianity,
the minimum of mutual information and maximum likelihood.
It has been improved that three criterions are equal to each
other in the term of information’s(Hyvarinen, 1997),the
difference between them is laying in the optimization algorithm,
which means different calculating complexity. For non-
gaussianity, kurtosis and negentropy are often used as
performance criterion. For a random variable y with zero
mean, its kurtosis can be defined as follows:
kurt(y) = E{y , }-3(E{y 2 ]) 1 (4)
Meanwhile, its negentropy is presented in (6), which is the
difference between the entropy of a gauss rand variable that is
same as y in standard deviation and the entropy of y .
H(x) = p ( x = a i) lo 8 p ( x = a i) (5)
/
ne(y) = H (y glnm )-H(y) (6)
A large collection of literature on optimization algorithms have
been proposed in the last ten years. Among them, a fast and
robust ICA (Fast-ICA) algorithm proposed by Hyvarinen
(Hyvarinen, 1997) is widely used, which is a fixed-point
algorithm based on an minimum of negentropy. Initially, this
algorithm is introduced using (6) as a criterion function, and is
subsequently extended into function (7) in order to reduce
complexity of computation.
ne(y) = [E{G(y)}-E{G( V )}] 1 (7)
y = W T X (8)
In which, V is a Gaussian variable of zero mean and unit
variance and y is a variable of zero mean and unit variance.
Function (/(•) can be practically any non-quadratic functions.
Two commonly used functions are listed below:
O) = - log cos(tf, y) a x g [1,2]
a
G 2 (y) = -exp(-y 2 /2)
In our experiment, G 2 (y) = — exp(—y 2 / 2) is used for
iteration. Our target is finding an appropriate value for vector
W in order that E{G(w T *)} has a maximum, which is
equivalent that the deviation of E{G(w T x)} is equal to 0. In
formula (7), g(*) is the deviation of G(*)
E'{G(y)} = E'{Gxg(w T x))} = 0 (8)
According to Newton Iteration algorithm, we can get iteration
method simplifies to:
- + E{xg(w T x)}
E{g'(w T x)}
Then we get following fixed-point algorithm for ICA:
1. Take a random initial vector W of norm 1.
2. Let w + = —is + is{:x:g(w r .x)}
3. Divide W + by its norm, then get a new W .