Proceedings of the Symposium on Global and Environmental Monitoring: Proceedings of the Symposium on Global and Environmental Monitoring

585 
OPTIMIZATION OF UNSUPERVISED LEARNING FOR HIGH RESOLUTION 
REMOTE SENSING DATA 
SHIMODA,HARUHISA; MATSUMAE,YOSHIAKI; NATENOPPARATH,AMNARJ; 
FUKUE,KIYONARI; SAKATA,TOSHIBUMI 
Tokai University Research & Information Center, Japan 
ABSTRACT 
Supervised learning methods have been widely used for land cover classification of 
satellite imagery. However, these methods have two big problems. First, training areas 
are selected arbitrary by operators though statistical theory requires random 
sampling. Second, it is very difficult for high resolution images such as Landsat TM 
data to extract sufficient number of training classes which are composed of only 
spectrally pure objects. 
Unsupervised learning with the aid of clustering can solve above problems. However, we 
do not have sufficient information to use clustering efficiently for high resolution 
data, i.e. (1) how many number of sampling data is necessary, (2) how many number of 
clusters should be generated. In this research, in order to answer to these questions, 
classification experiments of Landsat TM data were conducted. 
From the experiments, following results were obtained. (1) There exists the optimal 
number of clusters according to the sample size. (2) The method used to assign 
categories to each cluster dominates the classification accuracy. 
KEY WORDS:Clustering, Unsupervised learning. High resolution data 
1 INTRODUCTION 
With the launch of second generation high 
resolution sensors like Thematic 
Mapper(TM) and HRV, many kinds of 
researches have been done to certificate 
the capability of these sensors for land 
use classification. Most of the results 
of these studies have shown that 
classification accuracies using these 
sensors are not so high as expected when 
applying conventional supervised maximum 
likelihood classifier using only spectral 
information. These results have made many 
researchers to study spatial features 
like textures or more sophisticated 
classifier like expert systems, 
Dempster-Shafer rule or fuzzy classi 
fiers. 
In addition, those results also have 
shown the limitations of supervised 
learning system. As well known, 
supervised training area selections 
cannot be assured as random samples on 
which all the statistical method are 
based upon. This problem has not been 
emphasized in case of treating low 
resolution images like MSS. It is mainly 
because the image itself has been 
composed of rather homogeneous areas made 
from averaging process. 
In the case of high resolution sensors, 
this problem become hi-lightened. In many 
land use classification studies, 
estimated classification accuracies 
calculated from confusion matrix of 
training data has been very high (usually 
90 to 98%) while accuracies estimated 
from independent samples were very low 
(typically 60 to 70%). These results 
apparently show the fact that training 
data have not actually represented the 
statistics of their populations. 
Furthermore, in order to obtain high 
classification accuracies, an operator 
should select more than 50 classification 
classes for high resolution sensor data. 
From the stand point of operational image 
processing, this tendency that number of 
training classes are largely increasing 
will make the process almost impossible 
as a matter of fact. 
From the above reasons, unsupervised 
learning process or actually speaking, 
clusterings come to very important tool 
for land use classification of high 
resolution data. However, most of the 
studies on clusterings were mainly 
concerned about low resolution data like 
MSS, and optimal conditions or at least 
the least condition using clustering for 
high resolution sensor data are not well 
known. The purpose of this study is to 
obtain fundamental knowledge about the 
nature of clustering for high resolution 
data, and clarify the influence of 
sampling data size and number of clusters 
to the classification accuracy. 
2 DATA USED IN THE EXPERIMENTS 
2.1 Image Data 
Image data used in the experiments are 
shown in Fig.l and its specifications are 
shown below. 
sensor : Landsat TM 
date : 1984/Nov./4 
path-row : 107-35 
area : Hiratuka area Japan 
pixel size : 25m 
image size : 512 x 480
1
2
...
611
612
613
614
615
...
951
952
Full text: Proceedings of the Symposium on Global and Environmental Monitoring (Pt. 1)

Access restriction

Copyright

Note to user