tents
mates
vised
uster
route
cover
d for
ering
main
ility
ining
s for
ready
ad on
s and
the
sets
sy for
be a
sents
ility
an be
1 the
itual
ility
(5)
(6)
(7)
iples
ry
py
en
or
he
of
SS
et
or
hm
Usually a number of well separable spectral
cathegories is received. It is possible to compare
the results with the real identifiers of target
classes (considering the order in which temporary
classes are joined) and to correct the unperfectly
labeled training polygons.
3.2 Nonparametric supervised classification
The principles of Bayesian decision are frequently
applied to the classification of remotely sensed
multispectral data. The use of Bayesian strategy
supposes the identification of the probability
density function of each spectral class in order to
determine the decision function that minimizes the
probability of missclassification.
Classified pixels (vectors x) are assigned to one
of the classes according to P(wj |x) - probability
that class co; occurs at vector x. Using the
Bayes theorem we find wi so that
P(wi) . P(x| wi) = max P(W;) . P(x|o;) , (8)
3
where P(w;) is an apriori probability of class
C (extent of the class in the image) and
P(x|wi) is a conditional probability density.
A set IN; = {x13, ..., xn3) of nj observations of
X for every class wW; is available. Let [N= u [5
be the set of all training samples. To estimate the
density P(x|w;) at random vector x the analyst can
use the parametric or nonparametric methods.
Parametric classifiers are based on the assumption
that all vectors come from a certain type of
statistical distribution. The assumption of
Gaussian distribution is very often used. Then the
mean vectors and covariance matrices may be
estimated from the sets I'j. The parametric methods
are very time consuming for the application on
large area. There are some improvements possible
(Feiveson, 1983). But - what is more important the
data do not fulfil the presumption of normality.
The landuse classes have usually complex decision
boundary, especially in high-dimensional feature
space. However, the classifier decision is
influenced most heavily by the samples close to the
decision boundary.
That is why many authors suggest nonparametric
density estimations (Skidmore, 1988), (Cervenka,
1990). Nonparametric classifiers make no assumption
about the shape of the data distribution. Such
techniques are based on the concept, that the value
of density P(x|w;) at the point x can be estimated
using the training samples located within a small
region around the point x. The Parzen windows
(Parzen, 1962) are very often used for the density
estimations :
ni
P(x|w;) = 1/ni 2 har N
. FOX - xd )/hn ) , (9)
k=1 j J
N= dim (X),
where the function F(Y) is widely used in this
functional form (so called uniform kernel):
2-M iif Ivi] /th s 1
F(Y/hn) = =,
is 1,...,N
otherwise. (10)
875
Usually, hn = n;7-°/N, where C will be within the
interval (0,1). In fact, numbers of samples from
li within a hypercube centered at x are computed
in practical applications. Such a function can be
evaluated easily - individual features can be
tested one by one, and many samples can be
eliminated quickly. Then the classification using
(8) can be applied.
The nonparametric methods require a large number of
computations. Common classification problems
consist in the classification of millions vectors
into 10 - 20 classes using 3 - 7 features. However,
the decision of nonparametric classifiers is based
on the small subregion of the total feature space.
Several authors propose efficient solutions of this
problem. Fukunaga (Fukunaga, 1975) suggests a
decomposition of the original training set into
hierarchically arranged subsets. Then the whole
training set is represented by a tree structure,
where the succeeded nodes create a decomposition of
the preceding node. The root corresponds to the
whole set MM. The clustering method is used for the
decomposition of the training samples. The cluster
analysis is subsequently applied on the individual
nodes. Following ‘information is recorded for
every node p: mean vector Mp of all samples in the
node p (this set is denoted Sp), minimal and
maximal values of individual features and the value
rp Of maximum distance between Mp and xieSp. The
distance of all samples to the corresponding sample
mean are saved at the final tree evel. The
classification of any vector x corresponds to the
tree search. All vectors sufficiently near to the
vector x are sought. With the help of informations
which are saved in the tree nodes most of the nodes
can be eliminated. The given tree structure can be
used for nonparametric classification methods.
When using the Parzen windows with uniform kernel,
the test is performed at every tree level, if there
is an intersection between the window and the
parallepiped which contents all samples from the
tested node. Minimal and maximal feature values in
the node are exploited in such a case. These tests
are repeated for succeeded nodes in an affirmative
case only. At final level individual training
samples are tested if they fall within the window
centered at the classified sample. The features can
be checked one by one. In most cases (when the
training sample fall outside the window), it is not
necessary to check all features. From this point of
view it is advantageous to check the features with
greater entropy at first.
3.3 Nearest neighbours method
The k nearest neighbours (k-NN) method is based on
similar (local) principles as nonparametric
Bayesian classifiers. They find Kk nearest
neighbours to given sample x in the training set I.
The sample x is assigned to the class Wj, if the
majority of its k - nearest neghbours belongs to
that class 03. Ties may be broken arbitrarily using
some heuristics. These classification methods have
been proposed by many authors (Cover, 1967),
(Tomek, 1976). One of the most important
theoretical results is, that these methods have
good behaviour in the asymptotical sense (Wilson,
1972). For large values of nj the expected
probability of error P9 is bounded as follows:
p* p «^A, pt, (11)
IA
(A=2 for k=1).