AUTOMATIC CATEGORIZATION OF CLUSTERS
IN UNSUPERVISED CLASSIFICATION
Sunpyo HONG, Kiyonari FUKUE, Haruhisa SHIMODA, Toshibumi SAKATA
Tokai University Research and Information Center
2-28-4 Tomigaya, Shibuya-ku, Tokyo 151, JAPAN
ABSTRACT:
À cluster categorization method is necessary when an unsupervised classification is
used for remote sensing image classification. It is desirable that this method is
performed automatically, because manual categorization is a highly time consuming
process.
In this paper, several automatic determination methods were proposed and evaluated.
They are 1) maximum number method, which assigns the target cluster to the category
which occupies the largest area of that cluster; 2) maximum percentage method, which
assigns the target cluster to the category which shows the maximum percentage within
the category in that cluster; 8) minimum distance method, which assigns the target
cluster to the category having minimum distance with that cluster; 4) element ratio
matching method, which assigns the local region to the category having the most similar
element ratio of that region. From the results of experiments, it was certified that
the result by the minimum distance method was almost the same as the result made by a
human operator.
Key Words: unsupervised classification, post processing, categorization, clustering.
1. INTRODUCTION (3) It is time consuming when the number
of cluster is large or there are many
With the launch of second generation high small clusters.
resolution sensors like LANDSAT TM and
SPOT HRV, clustering method has been 3. AUTOMATIC CATEGORIZATION METHOD
revaluated recently. However, the main
problem of clustering for practical use To solve the above problems, several
is that clustering is an unsupervised automatic categorization methods are
classification. That is, clusters gener- considered as follows. In all methods,
ated by clustering are defined in feature training category areas(TCA) are first
vector space, not in image data. There- extracted from the target image similar
fore, in order to use the classified to supervised trainings.
result for a meaningful reference map, it
is necessary to determine the relation of (1) Maximum Number Method
clusters and categories, and to label the
classified result with the categories. In this method, the number of pixels in
each TCA for each cluster is calculated.
Conventionally, this relation has been Then the category having the maximum
determined mainly by interpretation of an number is assigned to that cluster.
operator. However, this process is time
consuming and is not objective. (2) Maximum Percentage Method
The purpose of this research is to try In this method, for each cluster, the
several methods of automatic categoriza- percentage(occupation rate) of that clus-
tion and find out the most useful method. ter in each TCA is calculated. Then the
In this paper, 4 methods have been exam- category having the maximum percentage is
ined. assigned to that cluster.
2. PROBLEMS OF CONVENTIONAL METHOD Fig. 1 shows a comparison of these two
methods in a simple case. Suppose that
In this method, each classified cluster cluster k is composed of three categories
is overlaid with the target image data on A, B and C. As shown im Fig. 1(a); cate^
the display, and that cluster is inter- gory À occupies the largest area in
preted by an operator to determine the cluster k and C occupies the minimum
category. Therefore, it can be thought area. In the maximum number method,
that the obtained result is natural and cluster k is always assigned to category
reliable. A. However, this figure does not show the
difference of areas of each catesgory.
However, since everything is determined Fig. 1(b) shows the case that the total
by an operator in this method, there are area of each category is the same and (c)
many problems as follows. shows the case that the total area of
each category is different. As shown from
(1) The result depends on the skill of this figure, categories which occupy
an operator. small areas in the image tends to be
(2) Objective and quantitative evalua- neglected in the maximum number method.
tion dis difficult. On the contrary, small area categories
139