586
2.2 Test Site Data
In order to assess the classification
accuracy, land use test site data were
prepared. This data cover 2km x 10km area
within the area covered by the image and
each 25m x 25m pixel of this test site
data are assigned specific land use code.
Fig.2 shows the test site data and Table
1 shows the categories and number of
pixels of each category.
As these data are based on land use
category, classification classes do not
necessarily coincide with these
categories. In order to assess the
classification accuracies using this test
site data, these 44 categories were
merged to 5 major categories also shown
in Table 1. Classification accuracies
were evaluated by these 5 major
categories.
In the course of this study, some
problems were revealed for this test site
data. The largest problem was that this
test site data has contained unbalanced
land use. In order to avoid this problem,
several areas were added to this test
site data. This new test site data are
shown in Fig.3 and the number of pixels
of each category are shown in Table 2.
From now on, the original test site data
are called as test site a while the
modified one will be called as test site
b.
METHODS
3.1 Clustering
The clustering method used in this
experiments is a hierarchical clustering
using Ward method. C-means clustering was
not used because of its difficulty in
obtaining optimal parameters.
3.2 Classifier
A maximum likelihood classifier was used
for the classifier.
3.3 Experimental Procedure
Samplings for clustering were done by
taking pixels at each grid points of
orthogonal grids. Number of samples in
this experiments vary from 900(30 x 30)
to 2500(50 x 50).
The final number of clusters were
indicated in each clustering processing.
After the clustering, clusters with less
than 6 pixels were eliminated because of
the restriction of a maximum likelihood
classifier. Therefore, there are two
types for number of clusters, i.e.
indicated number and resulted number. As
for indicated number, 10 to 160 clusters
were tried in the experiments.
process was introduced. Two kinds of
assignment process was used in the
experiments. One is to assign the target
cluster to the category with the largest
number of classified pixels of that
cluster. Another assignment is to assign
the target cluster to the category with
the largest percentage of classified
pixels of that cluster. The former is
called an area assignment while the
latter is called a percentage assignment,
hereafter.
4 RESULTS AND DISCUSSIONS
4.1 Results for test site a
Table 3 shows the results of
classification accuracies evaluated by
test site a. In this table, horizontal
columns corresponds to approximate number
of clusters used in the classification
and vertical columns correspond to number
of samples used for clusterings. In each
column, left hand side figures in
brackets correspond to indicated number
of clusters while right hand side figures
correspond to resulted number of
clusters. Table 3(a) shows area weighted
mean classification accuracies while (b)
shows arithmetic mean classification
accuracies. From Table 3(a), following
conclusions were obtained:
(1) Variations of classification
accuracies are very small, i.e. about
3.6% at the maximum.
(2) Almost no definite dependence of
classification accuracies on sample size
and cluster numbers can be observed.
Classified results were compared to
certificate the conclusion (1). Fig.4
shows some examples of classified
results. Fig.4(a) shows the result of
supervised learning for comparison.
Fig.4(b) shows the result of the case
when sample size was 30 x 30 and cluster
numbers were 10 while Fig.4(c) shows the
result of the case when sample size was
45 x 45 and cluster numbers were 74.
Compared to supervised result. Fig.4(c)
seems far more better than Fig.4(b) and
3.6% accuracy difference seems to small.
One of the reasons of these phenomena can
be considered to be the fact that areas
of each category in the test site data
are not balanced. Instead of taking the
area weighted mean of classification
accuracies, an arithmetic mean of
classification accuracies was calculated.
The results are shown in Table 3(b).
The following conclusions can be derived
from this result compared with Table
3(a) :
The process to assign each cluster to a
specific land use category is a time
consuming and difficult process. In order
to avoid classification accuracy
variations in this process caused by
human factors, an automatic assignment
(1) The absolute accuracies have
decreased about 7% from the area weighted
mean.
(2) If we carefully watch each column of
sample size, it seems that there exist a
peak in each column.