ial
all
nt.
ng
he
ng
T
he
to
1€
he
on
be
ial
ial
Kaichang Di
analysis of GIS data, the other is to support knowledge driven interpretation and analysis of remote sensing images.
SDMKD provides a new way of knowledge acquisition for remote sensing image classification. Some researchers have
done valuable work in this field. Eklund et al. extracted knowledge from TM images and geographic data in soil salinity
analysis using inductive learning algorithm C4.5 [Eklund et al., 1998], Huang et al. extracted knowledge from GIS data
and SPOT multispectral image in wetland classification using C4.5 too [Huang et al., 1997]. In these two studies,
geographic data were converted from vector to raster format in which the sampling size is equal to image pixel size. The
implementation of data mining techniques in spatial database, especially inductive learning method, and the
combination or integration of inductive learning with traditional image classification methods, are still need to be
further studied.
In this paper, data mining techniques are studied to discover knowledge from GIS database and remote sensing data in
order to improve land use classification of images. The paper is organized as follows. Section 2 describes the implement
of inductive learning in spatial databases. Section 3 presents the methods of inductive learning in remote sensing image
classification. Section 4 describes an experiment of land use classification of SPOT multispectral image. Finally we
come to a conclusion.
2 INDUCTIVE LEARNING AND ITS INPLEMENT IN SPATIAL DATABASE
There are a lot of methods can be used in spatial data mining [Li, et al., 1997], among them inductive learning is a most
import one. And there are many inductive learning algorithms which mainly come from the field of machine learning,
for example, AQ11 and AQI5 by Michalski, AEI and AE9 by Jiarong Hong, CLS by Hunt, ID3, C4.5 and C5.0 by
Quinlan, CN2 by Clark, etc [Hong, 1997]. ID3 series, including ID3, C4.5 and C5.0, are most famous and influential.
ID3, which is a kind of decision tree algorithm, adopts a strategy of “divide and conquer". It selects classification
attributes recursively based on information entropy [Quinlan, 1993]. ID3 runs fast in learning and classification, this
makes it effective for large database. The shortcoming of ID3 is that the decision tree is not clear as production rules,
especially when a decision tree is large, it is very difficult to understand what does the tree mean. The other
shortcoming is that ID3 can only deal with discrete attributes and it is restricted to two-class problems. C4.5, which is a
extension of ID3, can covert a decision tree to equivalent production rules and can deal with multi-class problem with
continuous attributes. These new features make C4.5 practical and most popular in the field of artificial intelligence and
machine learning. C5.0 is a further improved version of C4.5, which runs much faster in very large databases.
Therefore, we study the implementation of inductive learning in spatial database using C5.0 algorithm.
C5.0, as many other inductive learning algorithms, require that the training data are composed of several tuples and
each tuple has several attributes one of which is class label. If we treat records as tuples and fields as attributes, these
algorithms are very suitable for learning in relational database. Spatial data structure is more complex than the tables in
ordinary relational database. Besides tabular data, there are vector and raster graphic data in spatial database. And
generally, the features of graphic data are not explicitly stored in the database. Therefore, learning in spatial database is
more difficult than learning in ordinary relational database in selecting the tuple and attributes of training data.
We regard learning tuple selection as a problem of determining learning granularity. Two learning granularities are
proposed for inductive learning from spatial data, one is spatial object granularity, the other is pixel granularity. Spatial
object represents area, line and point objects in graphical database or area and linear features extracted from remote
sensing images. Pixel simply means the pixels of remote sensing images or cells of raster graphic data. Learning in
spatial object granularity can discover knowledge concerning location, shape, spatial relation, etc. The discovered
knowledge is generalized and can be used in intelligent spatial data analysis and also in remote sensing image
classification. When the discovered rules are pplied to image classification, the image must be clustered or pre-
classified to area or linear features before the rules are used. Learning in pixel granularity, on the other hand, can
discover knowledge about spectral, location, elevation, etc. The discovered rules are more specialized and suitable for
image classification, but not suitable for spatial data analysis and decision support. The two kinds of granularities have
their own shortcomings as well. Learning in pixel granularity can not utilize shape information and it is difficult to
utilize spatial association information. Learning in spatial object granularity can not utilize the detail information within
the object, for example learning in polygon granularity can not utilize the accurate elevation and slope value within a
polygon, and can only use a average or sample value. These two kinds of granularities should be selected for different
applications or may be used together.
After determining the learning granularity, the learning attributes should be determined. In ordinary relational
databases, the attributes can be the fields explicitly stored or derived fields by mathematical or logical operation. On the
contrary, the geometric features and spatial relations are not stored explicitly in spatial database, but hidden in the
multi-layer graphic data. Spatial analysis and spatial operation must be performed to extract the attributes about shape
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B3. Amsterdam 2000. 239