] im-
ram-
natic
atel-
te of
3-62.
sers’
on of
e re-
mote
Tigh-
model
2.
tereo
ering
east-
truc-
X
"ngl-
323.
g us-
ation
etric
.621-
POT
| Re-
AN INDUCTION-BASED MODEL FOR
CLASSIFICATION OF LANDSAT DATA
Richard J. Roiger, PhD
IN%"ROIGER@VAX1.MANKATO.MSUS.EDU"
Lee D. Cornell, MS
INS"CORNELLGVAX1.MANKATO.MSUS.EDU"
Associate Professors
Department of Computer and
Information Sciences
Mankato State University
Mankato, MN 56002-8400 USA
ISPRS Commission III
The current research presents an induction-based empirical model that uses a heuristic evaluation
function capable of utilizing the most predictive attributes in performing classification of
satellite data. This paper discusses the structure of this model and compares its classification
accuracy and other characteristics to those exhibited by other systems, both heuristic and
statistical.
The model is used to analyze Landsat data and perform classification of pixels into one of fifteen
different categories, with a demonstrated accuracy rate approaching 100 percent.
Key Words: Artificial Intelligence, Classification, Image Analysis, Landsat
ACKNOWLEDGEMENT
We would like to acknowledge the assistance of
Daniel Civco (Civco 1991, 1992a, 1992b) of the
University of Connecticut in sharing the test
data which he used in the training and testing
of a neural net system designed for remote
sensing data analysis and classification. These
data are sampled image data derived from a May
1988 Landsat Thematic Mapper (TM) scene
consisting of multispectral reflectance values
in six bands of the electromagnetic spectrum
(blue, green, red, near infrared, and two
middle infrared) for 15 different land covers.
The availability of these data provided us with
the ability to have valid benchmarks in terms
of the classification accuracy of the system
under development, without having to attempt to
repeat work which is already underway by
others.
We would also like to acknowledge the
continuing support of Dr. Maria Gini,
Department of Computer and Information
Sciences, at the University of Minnesota.
1. INTRODUCTION
In this paper we present SX-WEB, an exemplar-
based concept learning model capable of
analyzing digitized satellite images of the
earth's surface. SX-WEB is a modification of
EX-WEB (Roiger, 1991), an incremental concept
formation model of concept learning. With
EX-WEB, learning is unsupervised and
incremental. An unsupervised paradigm is, in
general, inappropriate for image classification
since most data images will not contain a
representative sampling of all available
classification categories. Because of this,
learning with SX-WEB is supervised. SX-WEB
retains EX-WEB's ability to learn incrementally
and to limit the use of the attributes used for
classification to those deemed most predictive
of class membership. However, for rapid
classifications, SX-WEB is best used as a non-
incremental system. SX-WEB can classify in
domains containing nominal, real-valued and
mixed data (both nominal and real-valued data
exist). Because digitized images are real-
valued, we will concentrate on SX-WEB's real-
valued data structure and similarity measure.
SX-WEB is written in PC Scheme. Scheme is a
LISP-based language conceived in the 1970s at
MIT by G.L. Steele and G.J. Sussman. PC Scheme
is an adaptation of Scheme developed by Texas
Instruments in the 1980s.
The training and testing data which was
provided by Daniel Civco consisted of 302
651
pixels for which ground truth had been
established. These data had been classified
into fifteen categories: Urban (UR),
Agriculture 1 (Al), Agriculture 2 (A2),
Turf/Grass (TG), Southern Deciduous (SD),
Northern Deciduous (ND), Coniferous (CO),
Shallow Water (SW), Deep Water (DW), Marsh
(MA), Shrub Swamp (SS), Wooded Swamp (WS), Dark
Barren (DB), Barren 1 (Bl), and Barren 2 (B2).
Each pixel was represented by six values,
consisting of the multispectral reflectance
values in six bands of the electromagnetic
spectrum: blue (0.45-0.52 um), green (0.52-0.60
um), red (0.63-.069 um), near infrared (0.76-
0.90 um), and two middle infrared (1.55-1.75
and 2.08-2.35 um).
2. THE SX-WEB LEARNING MODEL
In this section, we examine in detail the main
features of SX-WEB with help from the domain of
Landsat data images. We present SX-WEB's
exemplar-based similarity measure and
evaluation function. We conclude this section
with a complexity analysis.
2.1 Representing real-valued data with SX-WEB
The primary data structure used by SX-WEB is a
three level tree. Figure 1 shows the general
form of this tree structure. The nodes at the
instance-level of the tree represent the
individual training instances that have been
used to define the concept classes given at the
concept-level. For the domain in question, each
instance-level node contains an attribute-value
list consisting of the spectral band
identifications together with their specific
values. The values found within the attributes
of the instance nodes are used by SX-WEB's
exemplar-based evaluation function to classify
newly presented instances whose classification
is unknown.
The concept-level nodes of the tree in Figure
1l store the means and standard deviations of
the attributes found within their respective
instance-level children. That is, concept C,
contains the means and standard deviations for
the attributes found within I,, I,, I, and 1.
Figure 2 shows the mean and standard deviation
scores for the root-level node and the fifteen
concept-level classes formed with a training
set containing 155 instances. SX-WEB uses these
mean and standard deviation scores to determine
those attributes most predictive of class
membership.
To illustrate this, consider Figure 2 and the
mean values for the attribute BLUE. The
smallest mean score for the attribute BLUE is
71.4 and is found in the concept class