cessing
inimum
o called
and its
earning
(8)
e. Also,
y those
pdated,
cessing
UST,
vinning
clement
ne.
outs are
matrix
cessing
cessing
respond
n other
. makes
utspace
d by he
| these
esented
and its
dinates
usually
its and
es. This
ctors is
vectors
stances
)cessing
mputed
tor to
ues are
istances
ind the
column
BMU +
d, / ( d4*d, ). Plus is used, if the coordinate of second
closest weightvector is greater than the coordinate of
BMU.
9. EXPERIMENTS
So that Karhunen-Lówe transformation could be
compared to SOM, following experiments were carried
out. Datasets were generated using random number
generator and feature extraction was made. Transformed
datasets were classified and classification error
estimated. This process was repeated 50 times using
different datasets. Criterion to compare results was
minimize the classification error.
5.1 Classification
Classifications were made using the Bayes decision rule
for minimum error. A posteriori probability P(x | 0) is
calculated from a'priori probability P; and the conditional
density function (CDF) p(x | ®,) using the Bayes theorem
p(x|ej P,
c (9)
Y poop P,
i=l
P(x|w) =
where c is the number of classes. When x is to be
classified, the a posteriori probabilities are determined
for each class and x is assigned to the class with the
maximum a posteriori probability.
The value of CDF p(x | 0)) determines how closely sample
x belongs to class O0. It is estimated using a
nonparametric estimation method called k-nearest
neighbor estimation. This method estimates the CDFs
locally using small number neighboring samples. The k-
nearest neighbor estimate of the CDF of class i is
p(x|o) 5 —, (10)
where k is number of neighboring samples, n; is number
of samples in class i and v is the volume of hypersphere
which radius is distance between sample x and its kth
neighbor (Devivjer, 1982)
5.2 Error estimation
The probability of error is the most effective measure of
the performance of a classification system. In practise,
the probability of error must be estimated from the
available samples. First a classifier is designed using
training samples and then it is tested using test
samples. The percentage of misclassified test samples is
taken as an estimate of the probability of error.
The probability of error is estimated using resubstitution
(RES) and leave-one-out (LOO) estimation methods. The
resubstitution method uses the same set of samples to
train and test the classifier. Because training set and
testset are same set, errors estimated using this method
377
are unreliable, but can be used together with other error
estimation methods like leave-one-out method. In leave-
one-out method each sample is used for train and test,
although not at the same time. The classifier is trained
using (n-1) samples and tested on the remaining sample
(n is the total number of samples). This is repeated n
times with different training sets of size (n-1). The error
estimate is the total number of misclassified samples
divided by n (Devivjer, 1982).
5.3 Datasets
Three different datasets were used. The original
dimension of dataset was 8 and number of classes 2.
Datasets were generated using random number
generator. Number of samples per class was equal to
dimension times N, where N - 5, 10 or 100. Then
generated samples were classified and classification
errors estimated. This was repeated 50 times and each
time samples were generated independently. Finally, the
statistical descriptors, mean value, median value,
standard deviation, minimum and maximum values
were computed from classification errors.
In the first dataset, called II, mean of the first class was
M, = [0...0]" and mean of the second class was M, = [2.56
0...0]. Covariance matrices for both classes were identity
matrices I. In other words, class means differ and
covariances are same. Bayes error is about 10%.
In the second dataset, called I4I, mean of both classes
were M, = M, = [0...0]". Covariance matrix for first class
was identity matrix I and for second class 41. In other
words, class means are same and covariances differ.
Bayes error is about 9%.
In the third dataset, called IA, mean of the first class
was M, = [0.0] and mean of the second class was M, =
[3.86 3.10 0.84 0.84 1.64 1.08 0.26 0.01]. Covariance
matrix for first class was identity matrix I and for
second class the diagonal values were E = [8.41 12.06
0.12 0.22 1.49 1.77 0.35 2.73]. In this case, both class
means and covariances differ. Bayes error is about 1.9%
(Fukunaga, 1990).
5.4 Parameters of algorithms
The transformation matrix in Karhunen-Lówe
transformation was based on the eigenvectors of the
covariance matrix of dataset.
Parameters of SOM were the size of map, size of
neighborhood, number of inputvectors presented to
algorithm, starting value of o and its decreasing method.
In these experiments different sizes of map were used,
sizes 9x9, 11x11, 7x11 and 19x19 processing elements.
Size of neighborhood in the beginning was more than
half of the size of map and decreased linearly until only
one weightvector, BMU, was updated. Number of
inputvectors presented to the algorithm varied also, it
was at least 500 inputvectors per processing element.
When the size of map was 9x9 or 7x11 processing
elements, the number of inputvectors was 50000 or
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B2. Vienna 1996