Cy,
-dimensional data.
of optimality and
eighted supervised
set of appropriate
oplied this method
vere obtained from
hey were found to
ventional method.
ed the accuracy of
re between classes
al component anal-
; a lot of computa-
1s of channels. The
es are not selected
Canonical analysis
nd gives better re-
tracts the features
ty among classes.
e for significance-
es as linear combi-
components which
nalysis. Each fea-
lering the distance
istance satisfies a
tion for evaluating
cific purpose. The
by comparing the
xtracted features.
X TRACTION
| we can get train-
n image to derive
ihe characteristics
of most classes included in the image.
We denote hyper-dimensional data (N dimension) by a
vector y — (yi,::,yw) (': transpose), and suppose
that they are classified into one of, say, n classes. Then,
y can be decomposed into class mean y, and within-class
dispersion y.: that is, y is written as
Yi; = Ya, + Ye; (1)
=1~n j=1 mi),
(see Fig. 1), where y;; is j-th data of class i. We write
.the covariance matrix of y, y, and y. as Cyy, Ca and Ce
respectively. We call Ca and Ce between-class and within-
class covariance matrix, respectively. Here, we assume
that the covariance matrix of each class is identical. This
assumption is rather reasonable from the view point of the
generality of training data (Fujimura, 1981).
X
Yai
Yai
0 X
Fig. 1 Description of data
2.2 Feature Extraction
Here, for simplicity we consider two cases where one and
two most important classes should be discriminated from
all the other classes.
In general, classification accuracy increases as the separa-
bility" of classes increases. We use separability to evaluate
the performance of features extracted. We extract the fea-
tures which maximize the separability of a particular pair
of classes that we wish to discriminate.
Our method proposed here consists of two steps of pro-
cessing: pre-processing and feature extraction.
In the pre-processing, hyper-dimensional data y — (yi,
--,YN)' are reduced and normalized to m (m « N)
components z = (z1,:-+, 2m)’ by a linear transformation
z = A'y. From the assumption on C., the within-class
dispersion of each class in the original space has the same
ellipsoidal shape shown in Fig.1. After transformation,
they are normalized into an m dimensional sphere. This
makes the space uniform: this means that the distance
measured in terms of variance does not have directional-
ity in the space.
In the second step, features are successively extracted
(Kiyasu,1993, Fujimura,1994) until there remains no class
which has distance from the particular classes less than the
*We used the divergence (Kullback, 1959) as a measures of
separability. We call it as distance in the rest of this paper.
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B3. Vienna 1996
minimum distance obtained so far. Feature extraction is
done by determining sub-space in the feature space: that
is, by making a linear combination of z as a' z, where a is
an m dimensional weight vector which we call here feature
vector. Thus, feature extraction is no other than the deter-
mination of a feature vector. As the space is uniform now,
the direction of an optimal feature vector which discrim-
inates between two classes is obtained just by connecting
the centers of these classes. The feature vectors obtained
are orthogonalized to make independent.
The procedures for determining successive feature vectors
is as follows:
(1) First, we set an optimal feature vector a; between the
two nearest classes among the prescribed classes.
(2) Next, we evaluate the separability on a; for all the
combination of the prescribed classes.
(3) If there is any pair of prescribed classes which does
not have enough separability, we set an additional
feature vector a? between them. We ortho-normalize
the new vector a; with a; as shown in Fig. 2, so that
this feature is independent of the first one.
(4) Features are successively extracted in the same way
until all the distance among the prescribed classes are
larger than the minimum distance obtained so far.
(5) Then, we apply the procedures (2)~(4) to the dis-
tance among the prescribed and the other classes.
When only one class is prescribed, the procedure starts
from setting a feature vector between the class and its
nearest class in the feature space.
A feature a; z is equivalent to (A a,) y expression using
original data y, because z — A'y, where (A a;) means the
weighting factor for spectral data.
0 Z1
Fig.2 Feature vectors discriminating between
two classes
3. EXPERIMENTAL RESULTS OF
FEATURE EXTRACTION
We acquired data for five growth-states of tree leaves (A~E:
from young to fallen), soil, stone and concrete by using an
imaging spectrometer which we developed. We obtained
411 dimensional data from the sensor and used for the
experiments. For estimating the mean and the variance
of each class, 45 training data were used for each class.
235