flowing down the tree.
DATA COMPRESSION
At first, we select training data for the classification by
assigning training areas. As these training data contain
noise, we reduce the noise by compression of training data.
We compress them by averaging pixel densities of
neighboring 4 pixels. This averaging process achieves both
reduction of processing time and stabilization of boundary
for data division. After averaging, we assign category
number to all training data as an identifier, and merge them
into a group.
PROJECTING DATA ONTO 1D SUBFEATURE SPACE
In boundary search for binary division of training data, the
increase of the number of spectral bands reduces efficiency.
In order to reduce the quantity of data with the minimum
loss of information, we apply principal component analysis
(PCA) to the merged training data and obtain the first two
principal components. We suppose that image data have p
spectral bands. Using variance covariance matrix X, the
PCA process is written by
A
70
BX,B = | n
D;
RER NET (2)
where, A {i = 1, 2, ..., p) are eigen values and
À,2À,Z..2 À,, andb, (i 1, 2, ..., p eigen vectors. The
first two principal components P, and P, are obtained from
inner products between spectral density vector assigned to
a pixel and eigen vectors b {i = 1, 2}, respectively. For
abbreviation, we define PCA vector P as
Ps Fh (3)
After compressing the training data into 2D PCA vectors,
we produce 8 histograms from inner product among PCA
vector P and projection vectors
W, = cos(kx/8) + jsin(kx/8) (k = 0, … 7). (4)
Now, the merged training data are compressed onto 1D
subfeature space with the minimum loss of information about
data distribution, and we obtain 8 histograms.
SELECTION OF DIVISION BOUNDARY
We firstly select a candidate for the optimum boundary for
the binary division in each of 8 histograms, then determine
the optimum boundary among the candidates. Generally
speaking, the optimum boundary in clustering minimizes the
ratio of within-group-sum-of-squares to intragroup-sum-of-
328
Boundary
Group 1 <= | => Group 2
Fig. 1 Valleys and boundary selected in a histogram.
squares. We adopt a clustering criterion for the selection of
the optimum boundary.
We suppose that number of training data is / and a histogram
has total-sum-of-squares $,, and assume that histogram is
divided into two groups which have /, and /, data and within-
group-sum-of-squares S, and S,, respectively, as shown in
Fig.1. The total-sum-of-squares S, is written as
S, = 5 +5 +5 (5)
where, 5, is intra-group-sum-of-squares. We select the
candidate among valleys in the histogram minimizing an
index
Rz rS: (6)
As histograms have own dispersion in abscissa, we use
normalized index R / 5, for the selection of the optimal
boundary among the candidates. The boundary in an one
dimensional subfeature space corresponds to a hyperplane
in the full feature space. These division procedures are
applied recursively until all groups at terminal nodes have
identical category number. The coefficient vector projecting
spectral density vector onto the histogram on which the
optimal boundary is selected and threshold (position of the
boundary) are stored at the node of the binary division tree.
CLASSIFICATION OF WHOLE IMAGE
After production of binary division tree from training data,
we classify whole image data by flowing pixel data down the
tree. At a non-terminal node, we obtain inner product
between pixel data and coefficient vector assigned the node,
compare it with the threshold assigned, and determine the
division path accordingly to the result of the comparison.
PROCEDURES
The following is procedures of MLDF.
1)Select training areas for categories to be classified.
2)Apply data compression process to pixel data in all training
areas.
3)Label all compressed data. We use category number as
the identifier.
4)Merge all compressed data into a group.
5)Apply PCA process and obtain the first two components.
6)Produce 8 histograms from the components.
7)Select the optimal boundary for binary division and divide
data group into two subgroups.
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B4. Vienna 1996
8)If
We ‘
and
BDT
imac
area
varie
2
6(
Figu
has
SQUE
MLF