Proceedings; XXI International Congress for Photogrammetry and Remote Sensing: Proceedings; XXI International Congress for Photogrammetry and Remote Sensing

chen, jun
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B7. Beijing 2008 
network model is proposed (Figure 1). With the PCA part of the 
model, the Principal Component (PC) images are decorrelated 
and consequently the redundant information is annulled 
between the PC images. With ICA part of the model, we show 
that in the Independent Component (IC) images the mutual 
information is reduced compared to the Principal Component 
(PC) images. This implies that the zones of transition are 
detected and emerged and, at the same time, the zones of 
vegetation temporal evolution are preserved in the produced IC 
images. 
2.1 Principal Component Extraction 
The PCA-based part (Figure 2) is devoted to extract the PC 
images. It is based on the simultaneous diagonalization concept 
of the two matrices E x (input images covariance matrix) and I n 
(covariance matrix of the noise), via one orthogonal matrix A 
(Chitroub, et al., 2004). This means that the PC images (y) are 
uncorrelated and have an additive noise that has a unit variance. 
This step of processing allows us making our application 
coherent with the theoretical development of ICA (Lee et al., 
2000). 
Based on the well-developed aspects of the matrix theories and 
computations, the existence of A is proved in (Chitroub, et al., 
2004) and a statistical algorithm for obtaining it is proposed. 
Here, we propose a neuronal implementation of this algorithm 
(Chitroub, et al., 2001) with some modifications (Figure 2). It is 
composed of two PCA neural networks that have a same 
topology. The lateral weights cf, respectively cf forming the 
vector Ci, respectively C 2 , connect all the first m-I neurons 
with the mth one. These connections play a very important role 
in the model since they work toward the orthogonalization of 
the synaptic vector of the m\h neuron with the vectors of the 
previous m-1 neurons. The solid lines denote the weights w/, cf, 
respectively wf cf, which are trained at the mth stage, while 
the dashed lines correspond to the weights of the already trained 
neurons. Note that the lateral weights asymptotically converge 
to zero, so they do not appear between the already trained 
neurons. The first network of Figure 2 is devoted to whitening 
the noise, while the second one is for maximizing the variance 
given that the noise is being already whitened. Let Xj be the 
input vector of the first network. After convergence, the vector 
X is transformed to the new vector X’ via the matrix U = W¡.A' 
1/2 , where W I is the weighted matrix of the first network, A is 
the diagonal matrix of eigenvalues of E„ and A~ I/2 is the inverse 
of its square root. Next, X’ be the input vector of the second 
network. It is connected to M outputs, with M < N, 
corresponding to the intermediate output vector noted X 2 . Once 
this network is converged, the PC images to be extracted 
(vector F) are obtained such as: Y = A T .X = U. W 2 .X, where W 2 
is the weighted matrix of the second network. The activation of 
each neuron in the two parts of the network is a linear function 
of their inputs. The Ath iteration of the learning algorithm, for 
both networks, is: 
Jfc + l)= Jfc) + - ql{k)Jfc)) 
c{k + 1) = c(a) + /0(A:)(^»(A:)Q - ¿/¿(a)c(a)) 
where P and Q are, respectively, the input and output vectors of 
the network. (3(k) is a positive sequence of learning parameter. 
The global convergence of the PCA-based part of the model is 
strongly dependent on the parameter /3. The optimal choice of 
this parameter is well studied in (Chitroub, et al., 2001). 
2.2 Independent Component Extraction 
The M inputs of the ICA network model (Figure 3) are the PC 
images. The M output neurons correspond to the IC images 
(vector Z), then Z = B.Y, where B is the separating (or de 
mixing) matrix that we want to determine. 
ICA can be carried out by using many different methods 
(Chitroub, et al., 2004; Cardoso, 1999; Karhunen and 
Joutsensalo, 1994 ; Lee et al., 1999 ; Hyvarinen, 1999). In this 
paper, we have used the Informax algorithm to learn the matrix 
B. Using the concept of differential entropy and the invertible 
transformation of Z = B. Y, the mutual information between the 
outputs is minimized. This means that finding an invertible 
transformation B that minimizes the mutual information is 
approximately equivalent to finding directions in which the 
mutual information among the output components is minimized. 
The weight update rule will then be a gradient descent in the 
direction of maximum joint entropy. The mathematical details 
of the learning process is out of the scope of this paper and the 
reader could be consulting, for more details, the following 
references (Chitroub, et al., 2004 ; Karhunen and Joutsensalo, 
1994 ; Lee et al., 1999 ; Lee et al., 2000). 
Using the concept of differential entropy and the invertible 
transformation of Z = B.Y, the mutual information between the 
outputs is: 
'(0 = X" M*) - My) + logt\det B|) (2) 
where H(Zj are the marginal entropies of the outputs and H(Z) 
is the joint entropy of Z. By constraining z, to be uncorrelated 
and of unit variance, this implies that: det E{z.z T ) = 1. As the 
negentropy is a measure of non-Gaussianity, that is: 
J(z)=H(z M )-H(z) (3) 
So the mutual information and negentropy differ only by a 
constant that does not depend on B and the sign, that is: 
4z)=C~X„, z (4) 
which means that finding an invertible transformation B that 
minimizes the mutual information is approximately equivalent 
to finding directions in which the sum of non-Gaussianities of z, 
is maximized. Maximizing the joint entropy H(Z) can 
approximately minimize the mutual information among the 
output components: 
z t=gi( v i) < 5 ) 
where g,(vi) is an invertible monotonic non-linearity and V= 
B.Y. If the mutual information among the outputs is zero, the 
mutual information before the non-linearity must be zero as 
well since the nonlinear transfer function does not introduce 
any dependencies. Thus, the relation between z,-, v,-, and g t ( v ,) 
is such as: 
p(zf) = p(v i )l\dg i (vf)ldv\ (6)
1
2
...
67
68
69
70
71
...
496
497
Full text: Proceedings; XXI International Congress for Photogrammetry and Remote Sensing (Part B7-3)

Access restriction

Copyright

Note to user