Radiometricaly and
geometricaly corrected
Landsat TM imagery
Thematic GIS
data layers
Domain
expert
knowledge
ce ô
Applying second
decision tree
2 successive decision
trees
Classified me
>P
Aggregation of nonforest
classes
&
Sieve filtering
Generalized map
Figure 1: The classification work-flow
The results of decision tree induction were interactively
edited by the analyst. In a way, the analyst was just taking
advice from See5: it was his expert decision if and how
this advice will be used in the final decision tree. The
manual editing of the trees was an iterative process of
adding new criteria based on expert knowledge, pruning
of parts of the See5 trees, and combining parts of different
See5 trees. The role of machine learning was to identify
thresholds for continuous attributes (e.g. the NDVI
threshold to distinguish true "Farmland" within the
farmland class of the first stage classification result) and
to help finding out the complex combinations of criteria
within individual tree branches. To focus the search for
criteria within a particular branch, machine learning was
directed (1) by being applied only to the relevant subset of
the training data set (e.g. pixels with
UNSUPERVISED_RESULT = "Marsh" AND SLOPE = 0
to distinguish "Marsh" from "Forest" in flatland) and (2) by
adjusting the See5 pre-pruning and post-pruning
parameters. The decision tree editing was also
continuously accompanied by the following:
1. checking the accuracy of each
decision tree branch by a 10-fold cross-
validation on the corresponding training
data subset,
2. visual inspection of the classified
image, based on the knowledge of the
landscape and comparison to the
topographic map.
Two decision trees were finally set up for
successive reclassification of the
unsupervised classification results. The
second tree corrected for minor errors left
from the first one. For reasons of legibility
we decided against merging the two trees
into one.
After the map was reclassified by the two
decision trees, the "Unvegetated", "Water",
"Marsh" and "Farmland" subclasses in the
reclassified map were aggregated back
into the "Non-forest" class. Within the
homogeneous areas there still remained
isolated pixels, so a sieve filter was applied
to generalize the map to the desired
minimum mapping unit of 0,25 ha. The
sieve filter merged polygons equal to or
smaller than 0,25 ha (4 pixels) with the
largest neighboring polygon.
CHECKING ACCURACY AND SPATIAL
PRECISION
Using an independent reference sample,
the filtered map was checked for accuracy
and spatial precision. For comparison, the
CLC database was also checked in the
same fashion. The independent sample,
covering 3.130 ha, was obtained by
photointerpretation of 10 randomly located
aerial stereo images acquired within 1 year
of the satellite image acquisition date. The total forest
edge in the independent reference sample area was
delineated for all forest patches exceeding 0,25 ha.
Delineation of the other 3 classes was done by automatic
segmentation of the radiometricaly corrected Landsat TM
image, followed by identification of each segment on the
aerial stereo image. The segmentation, which refers to
automatic delineation of natural spatial units of the
landscape based on extracted edges (McCormick 1997),
was performed using the SILVICS software. Because of
landscape fragmentation, the sensitivity of segmentation
was chosen such as to obtain the smallest possible
segments. The average area of the segments was thus
1,20 ha (19 pixels).
First, the percentage of each class within the area of the
reference aerial images was determined and compared to
the true value. Next, the thematic accuracy was estimated
by per-pixel cross-tabulation for all reference pixels. The
accuracy of the forest border delineation was checked by
computing the IREB value - interquartile range epsilon
band (Dunn et al. 1990), which is defined as the distance
on either side of the true forest border, encompassing
50% of the classified forest border. The precision of the
"Forest" class polygon delineation was checked by