All the input data were in a raster format, having 1880 x
1360 pixels with a 25 m spatial resolution. The
classification was divided into two successive phases
(Figure 1):
1. unsupervised classification of the multispectral
satellite data and
2. rule-based reclassification using also the GIS data.
We tried to save as much effort as possible by automating
certain procedures. This was done by using a machine
learning algorithm, where the CLC database was the
reference data set. The CLC database entirely covers the
study area with a minimum mapping unit of 20 ha, giving a
general overview of the land cover. It was assumed that,
in spite of a considerable mapping generalization, some
subjectivity problems and mistakes in the CLC database
(Kobler and Hocevar 1999), there is enough information
inherent in the CLC to use it as a reference both in the
labeling stage of the unsupervised classification and in the
machine learning stage of decision tree generation. The
other, equally important, source of information when
defining the decision tree was the domain expert
knowledge. Before using the CLC as a reference, we
aggregated the CLC nomenclature into the more general
CLC_G classes relevant to our study. Because of the
mixed land-use / land-cover nature of the CLC
nomenclature, some of its classes were left out (denoted
by 0 in Table 1).
The Landsat TM satellite image (EC 1997) acquired in
July 1995 had already been ortorectified previously (NLR
1997) for the Slovenian CLC project. It was additionally
radiometricaly corrected prior to classification, to alleviate
the effects of variable illumination. Whereas the correction
of atmospheric effect was skipped because of its
complexity, the topographic normalization was performed
using the Minnaert method (Smith et al. 1980),
implemented in the SILVICS software (McCormick, JRC
1999). A raster DEM (SMAS 1995a) with a 100 m
resolution was used to derive the topographic variables.
CLASSIFICATION METHODS
The classification process encompassed unsupervised
classification, followed by supervised classification,
aggregation of the non-forest classes, and sieve filtering
(Figure 1). The first phase - unsupervised classification -
was used to group pixels in the Landsat TM radiometricaly
corrected image channels 2, 3, 4, 5 and 7 into "natural",
spectrally distinct classes. We decided against supervised
classification at this stage because the spectral classes
were so numerous, that it would be difficult to train on all
of them. Each spectral class (i.e. cluster) was labeled
according to the predominant CLC_G class. Visual
examination of this first map approximation showed that
there still remained some confusion among CLC_G
classes in the output image. Some spectral classes
related to more than one information (CLC_G) class,
indicating that some information classes were spectrally
similar and could not be distinguished from the
multispectral data alone.
During the second phase, additional information was used
to derive two decision trees that successively improve the
output of unsupervised classification. The additional
information relates to the per-pixel values of different GIS
layers, i.e. attributes (Table 2). The decision trees were
generated by interactively combining domain expert
knowledge with the results of automated induction of
decision trees (i.e.machine learning). The machine
learning of decision trees was based on the values of the
CLC_G attribute.
Decision trees (Quinlan 1986) predict the value (called
class) of a discrete dependent variable from the values of
a set of independent variables (called attributes), which
may be either continuous (e.g. SLOPE) or discrete (e.g.
FOREST81). Data describing a real system can be used
to learn or automatically construct a decision tree. The
common way to induce decision trees is the so-called
Top-Down Induction of Decision Trees (TDIDT, Quinlan
1986). Tree construction proceeds recursively starting
with the entire set of training examples. At each step, the
most informative attribute is selected as the root of the
(sub)tree and the current training set is split into subsets
according to the values of the selected attribute. For
discrete attributes, a branch of the tree is typically created
for each possible value of the attribute. For continuous
attributes, a threshold is selected and two branches are
created based on that threshold. For the subsets of
training examples in each branch, the tree construction
algorithm is called recursively. Tree construction stops
when all examples in a node are of the same class (or if
some other stopping criterion is satisfied). Such nodes are
called leaves and are labeled with the corresponding
values of the class.
An important mechanism used to prevent trees from over
fitting data is tree pruning. Pruning can be employed
during tree construction (pre-pruning) or after the tree has
been constructed (post-pruning). Typically, a minimum
number of examples in branches can be prescribed for
pre-pruning and confidence level in accuracy estimates
for leaves for post-pruning.
A number of systems exist for inducing classification trees
from examples, e.g., CART (Breiman et al., 1984),
ASSISTANT (Cestnik et al. 1987), and C4.5 (Quinlan,
1993). Of these, C4.5 is one of the most well known and
used decision tree systems. Its successor C5 (Quinlan
1998) represents the state-of-the-art in decision tree
induction at the time of writing this paper. The Windows
implementation of C5, named See5, was used in our
study.
Out of 2.558.160 pixels in the study area a training subset
of 127.537 pixels was selected for learning decision trees.
To avoid bias, the training subset was selected in a
stratified random fashion with equal number of pixels per
stratum. The size of the subset was limited by the size of
the smallest stratum. Two criteria were considered for
stratification: the original CLC class, generalized into 8
classes, and global yearly insolation (Gabrovec 1996),
split into 2 classes at the median value. 16 strata were
thus identified. To avoid possible problems at the
landscape unit edges (mixed pixels and imprecise edge
delineation in CLC), only pixels more than 100 m from the
stratum edge were candidates for the training subset.
95