587
The reason of conclusion (1) can be
attributed to the unbalance of areas of
each category. The reason of conclusion
(2) can be considered as follows.
Generally speaking, larger number of
clusters can bring higher classification
accuracies. However, if number of
clusters are too much compared to sample
size, generated clusters are statistcally
unstable, and classification accuracies
will decrease. If these assumptions are
true, the peaks(c1 uster numbers
corresponding to maximum classification
accuracy) in each sample size column will
move to larger number of clusters
according to increasing sample size and
the maximum classification accuracy will
be obtained at the maximum sample size.
Actually, the former estimation seems to
be realized in Table 3(c), but for the
latter estimation, the maximum
classification accuracy is obtained when
sample size is 45 x 45.
The classified result of the maximum
classification accuracy is shown in
Fig.6. If you compare Fig.3(a)
(supervised learning case) and Fig.6, it
is evident that high density urban areas
are almost disappeared in Fig.6. This
phenomena cannot be observed from
classification accuracies, because high
density urban areas and residential areas
are merged in the classification accuracy
assessment. The reason that high density
urban areas have disappeared is evident.
The spectral signatures of high density
urban areas and residential areas are
very similar, so there occurs much
misclassifications between these two
categories. However, as the areas of
residential areas are far larger than
high density urban areas, most of
clusters corresponding to high density
urban areas are assigned to residential
areas because of the nature of area
assignment.
The reason of this phenomena were
considered to be the unbalanced test
site. To improve the balance of areas
between categories, new test sites were
added to test site a.
4.2 Results for test site b
Table 4(a) and (b) shows the results for
test site b. Table 4(a) corresponds to
area weighted mean and Table 4(b)
corresponds to an arithmetic mean. From
Table 4(a), the following conclusions can
be obtained:
(1) The absolute accuracies have
increased about 10% compared to test site
a.
(2) The variation among classification
accuracies has increased a little (about
10%), but if we omit very low accuracies
(in the case of 10 clusters), variations
are still very small (about 5%).
(3) The tendency of peaks seems more
apparent compared to the case of test
site a. This tendency is also shown as a
graph in Fig.5. However, it is very
difficult to say that it is statistically
significant.
(4) The maximum classification accuracy
is still obtained at the case of sample
size 45 x 45.
In order to acquire more distinct
results, simple average were calculated
and is shown in Table 4(b). From this
result the following conclusion were
obtained:
(1) The absolute accuracies are now
almost the same with Table 4(a).
(2) The tendency of peaks also exists,
but it is broken at sample size 50 x 50.
(3) The variations of classification
accuracies are still very small and it is
difficult to obtain statistically
meaningful conclusion.
(4) The maximum classification accuracy
is obtained when sample size is 45 x 45
and number of clusters is 83, but the
accuracy in the case of when sample size
is 30 x 30 and 28 clusters were almost
the same (only 0.3% difference).
(5) The maximum classification accuracy
does not increase with the sample size.
In order to avoid this deficit,
percentage assignment has been applied.
The classified result are shown in Fig.7.
From this figure, the inverse effect can
be observed. That is, clusters are tend
to be assigned to small area categories.
For instance, if there is a category
which has only 3 pixels in the test site,
and 2 pixels in the same cluster are
classified to that category, the
percentage of that cluster to that
category is 66% and this cluster will be
assigned to that category even when other
100 pixels of that cluster are classified
to another category which has 200 pixels.
From the above results the following
conclusions were obtained:
(1) It is natural that classification
accuracies increase with the number of
sample size, but this assumption has not
been certified in this experiments.
(2) There exists the optimal number of
clusters according to the sample size.
(3) Variations among classification
accuracies were very small and classified
results were worse than the case of
supervised learning though the estimated
classification accuracies using test site
data were better.
(4) Most of the above conclusions were
mainly dominated by the category
assignment procedure, and a better
procedure is necessary to obtain more
distinct conclusion.