After this procedure has finished the final
classification happens using the resulted
dataset and the 1-NN classifier. By the
so called Multiediting approach, the number
of samples can be further reduced by repea
ting the above editing procedure until no
editing occurs. In /DevKit80/ both theore
tical and experimental evidence is given
for proving the most favorable result,
that the above procedure converges asympto
tically (in the number of iterations) to
the Bayes decision rule.
After the multiediting algorithm, the data
is nicely clustered (because all the misc-
lassified samples are rejected). The 1-
NN classifier creates piecewise linear
decision boundary (being approximation of
the Bayes boundary). This boundary is ac
tually defined by a small subset of samp
les belonging to the outer boarders of
the clusters. It is clear that only these
sample points are needed for the 1-NN clas
sifier. The aim of condensing is to find
such representative points.
So after multiediting and condensing, a
mostly powerful 1-NN classifier can be
used. It has the property of approximating
the Bayes decision boundary and being very
fast to compute. Our k-NN classifier has
been implemented by using this technique.
3.2 The Average Learning Subspace Method
The subspace methods are reported in
/Oja83/. A subspace in the n-dimensional
pattern space is spanned by p linearly
independent, orthonormal basis vectors
a i . The dimension of the subspace is then
p. From the point of view of classifica
tion, a subspace restricts the possible
directions in it, while the lengths remain
undetermined. Suppose we have M classes
in the classification problem, each being
represented by a subspace, with dimensiona
lities p^^ . Usually p i is much lower than
the original dimension. The simple classi
fication rule then states that: if the
distance between the pattern vector x and
subspace i is smaller than between the
pattern vector and subspace j, then classi
fy x in class i. The distance from the
subspace can be computed as follows:
d(x,L) = | x | 2 - Epix 1 ^ ) 2 ,
where u t are the orthonormal basis vec
tors .
Since |x| 2 is the same for each class, it
can be dropped and the classification rule
consists only of inner products. So, if
the subspaces dimensions are small, the
classifier is very fast.
■me essential question in the subspace
method is how to actually construct the
class subspaces to obtain optimal perfor
mance. The CLAFIC method /Watana69/ forms
each subspace by minimizing the mean square
error of the distances of a training set.
This can be shown to be equivalent of maxi
mizing
£ (u• T Cu. ),
p v j j ’'
The eigenvalue decomposition solves this
problem and usually the first few eigenvec
tors span the subspace.
A serious drawback of the CLAFIC method
is that each class subspace, although de
pending on the statistics of the class,
is formed independently from the other
classes. So, two classes overlapping each
other may be very hard to discriminate.
This leads to the ALSM-method which adapts
itself better to this situation by learn
ing. In the ALSM-method, the autocorrela
tion matrices are updated according to
misclassifications. This means that, if
either a sample vector of class i is misc
lassif ied to another class (j), or a sample
vector of another class (k) is misclassi-
fied to i, the learning phase of the clas
sifier updates the autocorrelation matri
ces of each class by:
R = R + £ a*S m <i ' j) - £ y3*S m (k ' i) ‘
m m-1 m r' m
Here a and /3 are small constants, and
should be small enough to avoid overshoot
ing. In this learning phase, the subspa
ces will be iterated sufficiently long so,
that the classification of the training
set becomes stable. The algorithm can be
proven to converge /Oja83/. The constants
a and /? were both set to values 0.005 in
this project.
4. SUMMARY OF THE RESULTS
The test material consists of two 1:15 000
aerial images both digitized in 4500*4500
format, one SPOT scene from rural and one
SPOT scene from urban area. Two indepen
dent test sites were created. The other
was used as a training sample and consisted
of approximately 1300 samples for each
class. The other was used as an indepen
dent test set and consisted of approximate
ly 5000 samples for each class. The clas
sification tried to separate five classes,
urban or residential areas, two types of
forest areas, pasture land or parks, and
fields. In case of multichannel imagery,
the textural descriptors were computed
from channel 3 (infrared).
The practical classification was performed
using a window of 31*31 pixels in the ae
rial images and using a window of 13*13
in the SPOT images.
Subspaces vs. extracted ad hoc features
Some preliminary runs were made to have an
idea, if there is some difference in using
the traditional feature descriptors of
the cooccurrence statistics and the Fou
rier power spectrum, compared to the stra
tegy of using these descriptors as such.
These tests were performed on the basis
of the aerial images only.
The dimensional reduction achieved by an
orthogonal transformation was surprisingly
high. The dimension of the final subspace
was usually 5-10, although the original
where C is the autocorrelation matrix.337