ork:
ting
1ing
Vol.
pre-
ipu-
ork:
Per-
res,
tion
ung,
ls in
ted,
ollte
Ob-
utt-
con-
con-
ern.
"y
»en-
n.
In-
bust
ls.),
THEMATIC MAP COMPILATION USING NONPARAMETRIC
CLASSIFICATION METHODS.
Vladimir Cervenka
Institut of Surveying and Mapping, Prague
Czechoslovakia
Commission No.: III/3
ABSTRACT:
A method combining unsupervised clustering and supervised nonparametric classification of multispectral
image data will be described. The creation of sufficiently representative training sets for supervised
classification may be a serious problem - it is difficult to find training samples, which cover the whole
feature space. Therefore results of unsupervised classification are used for completion of terrestrial
investigation. Then the training data are verified
using generalized entropy measure and mutual
information. Finally the principles of nonparametric Bayesian decision based on Parzen windows are applied.
Nonparametric methods have been shown to yield excellent results in applications other than remote sensing
for the present. These methods are suitable especially when there is a poor knowledge about real
probability densities or about their functional form. Unfortunately, they require storage and computation
proportional to the number of samples in the training set.
KEY WORDS: Algorithm, Artificial Intelligence, Classification, Feature Extraction, Image Interpretation,
Thematic, Training
1. INTRODUCTION
Gathering of information on the land use belongs to
the main goals of remote sensing methods. This task
is of special importance in regions with
complicated structural zoning, e.g. in urban
aglomerations and their surrounding. At present,
Thematic Mapper (TM) data are frequently exploited
for these purposes. A great attention has also been
paid to the development of their automatic
interpretation (classification). There are two
principal approaches to the classification:
supervised and unsupervised one.
Any computer classification that will lead to
a ground-cover thematic map is based on the ground
truth data gathered from selected area. The choice
of training samples has to be representative, but
random. However, the creation of sufficiently
representative training sets may be a serious
problem. Satellite images cover some hundreds km?
nevertheless it is difficult to find suitable
training samples, which cover the whole feature
space. Therefore results of unsupervised
classification are used for completion of
terrestrial investigation when significantly
different spectral classes are determined. The
unsupervised classification enables to reduce the
extent of subsequent supervised classification to
a selected subset of spectral classes.
The notion of unsupervised classification will be
presented in Section 2. The interpretation of
clustering results in terms of mutual information
will be proposed in Section 2.1. The practical
aspects of nonparametric classification methods and
various approaches are discussed in Section 3.
2. UNSUPERVISED CLASSIFICATION
The clustering method ISODATA has been used to
analyze satellite data (Charvat, 1987a). Using this
method approximately 50 % sample of pixels in the
scene is clustered. In the k-means ISODATA method
the pixels are placed in k groups (clusters)
according to the similarity of digital features.
The cluster centres are established during the
iterative clustering. Then all pixels are mapped
onto the original spatial domains using the nearest
neighbour classifier. To avoid the excessive CPU
873
time requirements, a threedimensional histogram is
used when all samples in the feature space with the
same feature values are represented with a specific
histogram cell. The clustering process is realized
in the reduced feature space only. A feature
reduction technique is necessary for this reason -
usually three new synthetic features (images) are
computed.
2.1 Feature reduction
There are two basic reasons for incorporating the
feature reduction procedure into the classification
process. The ISODATA method uses threedimensional
histogram, so the maximal number of features is
three. A color composite production is the second
reason for transformation of all disposable
spectral bands into the three ones. The color
composite created on the basis of the three
uncorrelated features preserves great deal of
spectral information from all original spectral
bands. The method used improves the contrast of the
color composite significantly. The color composites
seems to be a useful tool for collection and
verification of training samples as well as for the
visual verification of classification results.
The use of "Tasseled Cap” transformation (Crist,
1984a) or the principal component method for this
purpose has been described. A method based on
neural networks can be utilized successfully
(Charvat, 1990) when the back propagation algorithm
(Hinton, 1987) is used. The neural net proposed
consists of three layers. Input and output layer
has the same number of nodes (neurons) equal to the
number of spectral bands, the middle layer has
three nodes in our case. Each node in the middle
layer is connected with all nodes in preceding and
succeeded layer. The neural net can be described by
a unidirectional graph where nodes (neurons) bear
some value. A certain weight is assigned to every
connection. In the course of adaptation the feature
values of selected samples are assigned to the
nodes in the input layer and the values xi in the
middle and output layer are computed according to
the expression
Xi = 114 Xj), (1)
je