where J is a set of neurons from previous layer and
S is usually a sigmoid function. The weights of
connections are changed using the back propagation
algorithm until all node values in the input and
output layer are approximately the same. Then the
corresponding values in the middle layer can be
considered as the effective comprimation of the
original information. Finally the new features are
computed for all pixels when the original feature
values are introduced to the input layer of
"instructed" network. The synthetic images computed
may be used as R, G, B components of additive color
composite. The technical details and description of
adaptation process are beyond the scope of this
paper and has been discussed by several authors
(Fahlman, 1988).
2.2 Interpretation of clustering results
The ISODATA method produces clusters that can be
bounded by a hypersphere or by a hyperellipsoid.
Therefore it is necessary to group the data into
more clusters than is the number of spectral
classes. Some of the classes are broken into
a several clusters. Higher number of clusters
brings problems in subsequent interpretation of
classification results. The theory of information
gives us an efficient tool for solution of this
problem (Charvat, 1990).
If P(x) denotes probability distribution of random
variable X in some discrete space, the Shannons
entropy H(Px) is defined as follows:
H(Px) = - XI P(x) . log P(x). (2)
When P(x,y) is a probability distribution of
a composed variable (X,Y) and P(x), P(y) are the
marginal distributions, then mutual information
between variables X and Y is defined as follows:
P(x,y)
I(X,)z X "P(x.y). 109
Xy P(X) . Py)
(3)
The mutual information I(X,Y) can be considered as
a general dependency measure between the variables
X and Y.
The result of unsupervised classification may be
interpereted easily when using the mutual
information. Number of resulting clusters even
after removing of nonsignificant ones is usually
high. It is necessary to join several classes in
the resulting image. Let P( wi, wj) is
a probability that the classes wi and Wj occur in
the neighbouring pixels and P(wi), P(wj) are
aposteriori probabilities of classes (areal
extents). Then the spatial dependency between
individual classes may be described using the
mutual information by the expression:
P(wi, wi)
IP(wWi, Wi) . log
1,3 P( C2 1).P( 923)
(4)
It is the mutual information computed in the image
Space. For every two classes the value of loss of
this mutual information is computed if they are
joined. The system recommends to join such two
classes for which this loss of ‘information is
874
minimized. The procedure is repeated until a
satisfactory result is reached.
A preliminary unsupervised classification and
interpretation yields the approximate areal extents
of the cover classes. They can be used as estimates
of apriori class probabilities when supervised
classification is applied. The resulting cluster
domains and color composite map are used to route
the terrestrial investigations when main landcover
classes are delineated.
3. SUPERVISED CLASSIFICATION
3.1 Verification of training samples
When the supervised classification is used for
satellite image data interpretation, the gathering
of suitable training samples creates the main
problem. It is necessary to test the separability
of classes and verify labeling of training
polygons. Some methods solving these problems for
normally distributed data have been already
investigated (Charvat, 1987b). They are based on
the statistical comparisons of mean vectors and
covariance matrices.
The mutual information can characterize the
separability between classes. Let the training sets
are collected for every <class W; ef, for
j = 1,...,M (M is a number of classes), x will be a
random vector of feature space X which represents
the multispectral image. The probability
distributions P(wi), P(x) and P(x, wi) can be
estimated on the ground of training samples. In the
case of absolutely separable classes the mutual
information I(X,f) and entropy H(Po ) of probability
distribution of classes are equal: It follows:
I(X,4&)/ H(Pa ) 21, (5)
where
P(x, wi)
I(X,4h) = X P(x, wi) . 109g (6)
x,i P(x) . P( 41)
and
HP) = - : P(wi) . log P( Wi). (7)
The algorithm for verification of training samples
is based on this idea:
1) Class identifiers are assigned to every
training polygon - every polygon is considered
as a temporary spectral class.
2) The mutual information I(X,.fA) and entropy
H(Pg,) are computed using the estimates of
P(x), P(wi), P(x, wi). The method of Parzen
windows which will be described is used for
this purpose.
3) If 1 = IX, J/NP ) "« €^ tnen the
algorithm stops.
4) For every two temporary classes the loss of
I(X,£) is computed if they are joined.
5) Such two classes for which is the loss
minimal are found and joined. The current set
of features cannot be probably used for
discrimination of these classes. The algorithm
goes back to the step 2).
whe
fur