Full text: XVIIIth Congress (Part B3)

of most classes included in the image. 
We denote hyper-dimensional data (N dimension) by a 
vector y — (yi,::,yw) (': transpose), and suppose 
that they are classified into one of, say, n classes. Then, 
y can be decomposed into class mean y, and within-class 
dispersion y.: that is, y is written as 
Yi; = Ya, + Ye; (1) 
=1~n j=1 mi), 
(see Fig. 1), where y;; is j-th data of class i. We write 
.the covariance matrix of y, y, and y. as Cyy, Ca and Ce 
respectively. We call Ca and Ce between-class and within- 
class covariance matrix, respectively. Here, we assume 
that the covariance matrix of each class is identical. This 
assumption is rather reasonable from the view point of the 
generality of training data (Fujimura, 1981). 
0 X 
Fig. 1 Description of data 
2.2 Feature Extraction 
Here, for simplicity we consider two cases where one and 
two most important classes should be discriminated from 
all the other classes. 
In general, classification accuracy increases as the separa- 
bility" of classes increases. We use separability to evaluate 
the performance of features extracted. We extract the fea- 
tures which maximize the separability of a particular pair 
of classes that we wish to discriminate. 
Our method proposed here consists of two steps of pro- 
cessing: pre-processing and feature extraction. 
In the pre-processing, hyper-dimensional data y — (yi, 
--,YN)' are reduced and normalized to m (m « N) 
components z = (z1,:-+, 2m)’ by a linear transformation 
z = A'y. From the assumption on C., the within-class 
dispersion of each class in the original space has the same 
ellipsoidal shape shown in Fig.1. After transformation, 
they are normalized into an m dimensional sphere. This 
makes the space uniform: this means that the distance 
measured in terms of variance does not have directional- 
ity in the space. 
In the second step, features are successively extracted 
(Kiyasu,1993, Fujimura,1994) until there remains no class 
which has distance from the particular classes less than the 
*We used the divergence (Kullback, 1959) as a measures of 
separability. We call it as distance in the rest of this paper. 
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B3. Vienna 1996 
minimum distance obtained so far. Feature extraction is 
done by determining sub-space in the feature space: that 
is, by making a linear combination of z as a' z, where a is 
an m dimensional weight vector which we call here feature 
vector. Thus, feature extraction is no other than the deter- 
mination of a feature vector. As the space is uniform now, 
the direction of an optimal feature vector which discrim- 
inates between two classes is obtained just by connecting 
the centers of these classes. The feature vectors obtained 
are orthogonalized to make independent. 
The procedures for determining successive feature vectors 
is as follows: 
(1) First, we set an optimal feature vector a; between the 
two nearest classes among the prescribed classes. 
(2) Next, we evaluate the separability on a; for all the 
combination of the prescribed classes. 
(3) If there is any pair of prescribed classes which does 
not have enough separability, we set an additional 
feature vector a? between them. We ortho-normalize 
the new vector a; with a; as shown in Fig. 2, so that 
this feature is independent of the first one. 
(4) Features are successively extracted in the same way 
until all the distance among the prescribed classes are 
larger than the minimum distance obtained so far. 
(5) Then, we apply the procedures (2)~(4) to the dis- 
tance among the prescribed and the other classes. 
When only one class is prescribed, the procedure starts 
from setting a feature vector between the class and its 
nearest class in the feature space. 
A feature a; z is equivalent to (A a,) y expression using 
original data y, because z — A'y, where (A a;) means the 
weighting factor for spectral data. 
0 Z1 
Fig.2 Feature vectors discriminating between 
two classes 
We acquired data for five growth-states of tree leaves (A~E: 
from young to fallen), soil, stone and concrete by using an 
imaging spectrometer which we developed. We obtained 
411 dimensional data from the sensor and used for the 
experiments. For estimating the mean and the variance 
of each class, 45 training data were used for each class. 

