XVIIth ISPRS Congress: XVIIth ISPRS Congress

fritz, lawrence w.; lucas, james r.
tents 
mates 
vised 
uster 
route 
cover 
d for 
ering 
main 
ility 
ining 
s for 
ready 
ad on 
s and 
the 
sets 
sy for 
be a 
sents 
ility 
an be 
1 the 
itual 
ility 
(5) 
(6) 
(7) 
iples 
ry 
py 
en 
or 
he 
of 
SS 
et 
or 
hm 
Usually a number of well separable spectral 
cathegories is received. It is possible to compare 
the results with the real identifiers of target 
classes (considering the order in which temporary 
classes are joined) and to correct the unperfectly 
labeled training polygons. 
3.2 Nonparametric supervised classification 
  
The principles of Bayesian decision are frequently 
applied to the classification of remotely sensed 
multispectral data. The use of Bayesian strategy 
supposes the identification of the probability 
density function of each spectral class in order to 
determine the decision function that minimizes the 
probability of missclassification. 
Classified pixels (vectors x) are assigned to one 
of the classes according to P(wj |x) - probability 
that class co; occurs at vector x. Using the 
Bayes theorem we find wi so that 
P(wi) . P(x| wi) = max P(W;) . P(x|o;) , (8) 
3 
where P(w;) is an apriori probability of class 
C (extent of the class in the image) and 
P(x|wi) is a conditional probability density. 
A set IN; = {x13, ..., xn3) of nj observations of 
X for every class wW; is available. Let [N= u [5 
be the set of all training samples. To estimate the 
density P(x|w;) at random vector x the analyst can 
use the parametric or nonparametric methods. 
Parametric classifiers are based on the assumption 
that all vectors come from a certain type of 
statistical distribution. The assumption of 
Gaussian distribution is very often used. Then the 
mean vectors and covariance matrices may be 
estimated from the sets I'j. The parametric methods 
are very time consuming for the application on 
large area. There are some improvements possible 
(Feiveson, 1983). But - what is more important the 
data do not fulfil the presumption of normality. 
The landuse classes have usually complex decision 
boundary, especially in high-dimensional feature 
space. However, the classifier decision is 
influenced most heavily by the samples close to the 
decision boundary. 
That is why many authors suggest nonparametric 
density estimations (Skidmore, 1988), (Cervenka, 
1990). Nonparametric classifiers make no assumption 
about the shape of the data distribution. Such 
techniques are based on the concept, that the value 
of density P(x|w;) at the point x can be estimated 
using the training samples located within a small 
region around the point x. The Parzen windows 
(Parzen, 1962) are very often used for the density 
estimations : 
ni 
P(x|w;) = 1/ni 2 har N 
. FOX - xd )/hn ) , (9) 
k=1 j J 
N= dim (X), 
where the function F(Y) is widely used in this 
functional form (so called uniform kernel): 
2-M iif Ivi] /th s 1 
F(Y/hn) = =, 
is 1,...,N 
otherwise. (10) 
875 
Usually, hn = n;7-°/N, where C will be within the 
interval (0,1). In fact, numbers of samples from 
li within a hypercube centered at x are computed 
in practical applications. Such a function can be 
evaluated easily - individual features can be 
tested one by one, and many samples can be 
eliminated quickly. Then the classification using 
(8) can be applied. 
The nonparametric methods require a large number of 
computations. Common classification problems 
consist in the classification of millions vectors 
into 10 - 20 classes using 3 - 7 features. However, 
the decision of nonparametric classifiers is based 
on the small subregion of the total feature space. 
Several authors propose efficient solutions of this 
problem. Fukunaga (Fukunaga, 1975) suggests a 
decomposition of the original training set into 
hierarchically arranged subsets. Then the whole 
training set is represented by a tree structure, 
where the succeeded nodes create a decomposition of 
the preceding node. The root corresponds to the 
whole set MM. The clustering method is used for the 
decomposition of the training samples. The cluster 
analysis is subsequently applied on the individual 
nodes. Following ‘information is recorded for 
every node p: mean vector Mp of all samples in the 
node p (this set is denoted Sp), minimal and 
maximal values of individual features and the value 
rp Of maximum distance between Mp and xieSp. The 
distance of all samples to the corresponding sample 
mean are saved at the final tree evel. The 
classification of any vector x corresponds to the 
tree search. All vectors sufficiently near to the 
vector x are sought. With the help of informations 
which are saved in the tree nodes most of the nodes 
can be eliminated. The given tree structure can be 
used for nonparametric classification methods. 
When using the Parzen windows with uniform kernel, 
the test is performed at every tree level, if there 
is an intersection between the window and the 
parallepiped which contents all samples from the 
tested node. Minimal and maximal feature values in 
the node are exploited in such a case. These tests 
are repeated for succeeded nodes in an affirmative 
case only. At final level individual training 
samples are tested if they fall within the window 
centered at the classified sample. The features can 
be checked one by one. In most cases (when the 
training sample fall outside the window), it is not 
necessary to check all features. From this point of 
view it is advantageous to check the features with 
greater entropy at first. 
3.3 Nearest neighbours method 
The k nearest neighbours (k-NN) method is based on 
similar (local) principles as nonparametric 
Bayesian classifiers. They find Kk nearest 
neighbours to given sample x in the training set I. 
The sample x is assigned to the class Wj, if the 
majority of its k - nearest neghbours belongs to 
that class 03. Ties may be broken arbitrarily using 
some heuristics. These classification methods have 
been proposed by many authors (Cover, 1967), 
(Tomek, 1976). One of the most important 
theoretical results is, that these methods have 
good behaviour in the asymptotical sense (Wilson, 
1972). For large values of nj the expected 
probability of error P9 is bounded as follows: 
p* p «^A, pt, (11) 
IA 
(A=2 for k=1).
1
2
...
884
885
886
887
888
...
1000
1001
Full text: XVIIth ISPRS Congress (Part B3)

Access restriction

Copyright

Note to user