The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B7. Beijing 2008
network model is proposed (Figure 1). With the PCA part of the
model, the Principal Component (PC) images are decorrelated
and consequently the redundant information is annulled
between the PC images. With ICA part of the model, we show
that in the Independent Component (IC) images the mutual
information is reduced compared to the Principal Component
(PC) images. This implies that the zones of transition are
detected and emerged and, at the same time, the zones of
vegetation temporal evolution are preserved in the produced IC
images.
2.1 Principal Component Extraction
The PCA-based part (Figure 2) is devoted to extract the PC
images. It is based on the simultaneous diagonalization concept
of the two matrices E x (input images covariance matrix) and I n
(covariance matrix of the noise), via one orthogonal matrix A
(Chitroub, et al., 2004). This means that the PC images (y) are
uncorrelated and have an additive noise that has a unit variance.
This step of processing allows us making our application
coherent with the theoretical development of ICA (Lee et al.,
2000).
Based on the well-developed aspects of the matrix theories and
computations, the existence of A is proved in (Chitroub, et al.,
2004) and a statistical algorithm for obtaining it is proposed.
Here, we propose a neuronal implementation of this algorithm
(Chitroub, et al., 2001) with some modifications (Figure 2). It is
composed of two PCA neural networks that have a same
topology. The lateral weights cf, respectively cf forming the
vector Ci, respectively C 2 , connect all the first m-I neurons
with the mth one. These connections play a very important role
in the model since they work toward the orthogonalization of
the synaptic vector of the m\h neuron with the vectors of the
previous m-1 neurons. The solid lines denote the weights w/, cf,
respectively wf cf, which are trained at the mth stage, while
the dashed lines correspond to the weights of the already trained
neurons. Note that the lateral weights asymptotically converge
to zero, so they do not appear between the already trained
neurons. The first network of Figure 2 is devoted to whitening
the noise, while the second one is for maximizing the variance
given that the noise is being already whitened. Let Xj be the
input vector of the first network. After convergence, the vector
X is transformed to the new vector X’ via the matrix U = W¡.A'
1/2 , where W I is the weighted matrix of the first network, A is
the diagonal matrix of eigenvalues of E„ and A~ I/2 is the inverse
of its square root. Next, X’ be the input vector of the second
network. It is connected to M outputs, with M < N,
corresponding to the intermediate output vector noted X 2 . Once
this network is converged, the PC images to be extracted
(vector F) are obtained such as: Y = A T .X = U. W 2 .X, where W 2
is the weighted matrix of the second network. The activation of
each neuron in the two parts of the network is a linear function
of their inputs. The Ath iteration of the learning algorithm, for
both networks, is:
Jfc + l)= Jfc) + - ql{k)Jfc))
c{k + 1) = c(a) + /0(A:)(^»(A:)Q - ¿/¿(a)c(a))
where P and Q are, respectively, the input and output vectors of
the network. (3(k) is a positive sequence of learning parameter.
The global convergence of the PCA-based part of the model is
strongly dependent on the parameter /3. The optimal choice of
this parameter is well studied in (Chitroub, et al., 2001).
2.2 Independent Component Extraction
The M inputs of the ICA network model (Figure 3) are the PC
images. The M output neurons correspond to the IC images
(vector Z), then Z = B.Y, where B is the separating (or de
mixing) matrix that we want to determine.
ICA can be carried out by using many different methods
(Chitroub, et al., 2004; Cardoso, 1999; Karhunen and
Joutsensalo, 1994 ; Lee et al., 1999 ; Hyvarinen, 1999). In this
paper, we have used the Informax algorithm to learn the matrix
B. Using the concept of differential entropy and the invertible
transformation of Z = B. Y, the mutual information between the
outputs is minimized. This means that finding an invertible
transformation B that minimizes the mutual information is
approximately equivalent to finding directions in which the
mutual information among the output components is minimized.
The weight update rule will then be a gradient descent in the
direction of maximum joint entropy. The mathematical details
of the learning process is out of the scope of this paper and the
reader could be consulting, for more details, the following
references (Chitroub, et al., 2004 ; Karhunen and Joutsensalo,
1994 ; Lee et al., 1999 ; Lee et al., 2000).
Using the concept of differential entropy and the invertible
transformation of Z = B.Y, the mutual information between the
outputs is:
'(0 = X" M*) - My) + logt\det B|) (2)
where H(Zj are the marginal entropies of the outputs and H(Z)
is the joint entropy of Z. By constraining z, to be uncorrelated
and of unit variance, this implies that: det E{z.z T ) = 1. As the
negentropy is a measure of non-Gaussianity, that is:
J(z)=H(z M )-H(z) (3)
So the mutual information and negentropy differ only by a
constant that does not depend on B and the sign, that is:
4z)=C~X„, z (4)
which means that finding an invertible transformation B that
minimizes the mutual information is approximately equivalent
to finding directions in which the sum of non-Gaussianities of z,
is maximized. Maximizing the joint entropy H(Z) can
approximately minimize the mutual information among the
output components:
z t=gi( v i) < 5 )
where g,(vi) is an invertible monotonic non-linearity and V=
B.Y. If the mutual information among the outputs is zero, the
mutual information before the non-linearity must be zero as
well since the nonlinear transfer function does not introduce
any dependencies. Thus, the relation between z,-, v,-, and g t ( v ,)
is such as:
p(zf) = p(v i )l\dg i (vf)ldv\ (6)