97
computing the ratio between the true and classified forest
border length in the independent reference sample area.
RESULTS AND DISCUSSION
For unsupervised classification, it was decided to limit the
number of clusters to 119, because that is where a break
occurs in the histogram of the initial clustered image.
Table 5 shows the two decision trees that were
successively applied to reclassify the result of
unsupervised classification of the Landsat TM data.
Figure 2 shows the final forest border map, which was
obtained after the result of the second reclassification was
thematically aggregated back into the 4 main classes and
generalized with the sieve filter.
A comparison of the land cover structure within the area
of the reference aerial images (Table 3) shows the
classified map to be closer to the true values than the
CLC, especially for the "Forest" and "Shrub" classes. A
site-specific comparison confirms the improvement in
classification accuracy over the CLC database (Table 4).
The thematic accuracy for the 4 main classes is estimated
to 81,8% (Kappa 66,6%) for the rule-based classification
and 75,3% (Kappa 57,3%) for the CLC database, while at
the "Forest" / "Everything else" thematic level the
accuracy increases to 91,2% (Kappa 81,3%) and 87,2%
(Kappa 73,5%) respectively. It is evident from the error
matrices (Table 4) that the accuracy is the lowest for the
"Shrub" and "Abandoned pasture" classes. We attribute
this to (1) their transitional character, making them difficult
to consider in the decision trees and to (2) the lack of
relevant information in the GIS layers. The forest border
delineation in the classified map is slightly more accurate
than the one in the CLC: the accuracy of the forest border
delineation as estimated by the IREB value is ± 14 m for
the classification and ± 15 m for the CLC. The minimum
mapping unit is 0,25 ha for the classified map and 20 ha
for the CLC. The classified map is therefore spatially more
precise by definition. The improvement in precision is
confirmed by the ratio of the classified to the true forest
border length, which is 92,6% for the rule-based
classification and 33,4% for the CLC database.
It may come as a surprise that the rule-based
classification performs so much better than the CLC,
given that the rules/trees were learned by using the CLC
as the target class. However, there is a logical explanation
for this phenomenon, which has also been observed in
other applications of machine learning and is known under
the name of "clean-up effect" (Michie and Camacho
1994). The learning process employed (which in our case
also takes into account domain knowledge) has an
averaging effect implicit in generalization which abstracts
away the individual errors made by humans and yields
performance similar to that of the trained humans but
more dependable.
Compared to the photointerpretation work on the
Slovenian CLC database project (Kobler et al. 1998), 70%
less man-days were needed to complete the classification
of the study area. Proportionally less time should be
needed for larger areas.
True value
Classification
CLC database
Forest
62,1%
62,8%
57,6%
Shrub
5,0%
6,6%
11,9%
Abandoned pasture
9,8%
7,8%
9,9%
Non-forest
23,2%
22,7%
20,5%
Table 3: Structure of the land cover within the area of the reference aerial images
Classification
CLC database
Forest
Shrub
Aband.
pasture
Non
forest
Forest
Shrub
Aband.
pasture
Non
forest
TOTAL
Reference
data
Forest
29.072
987
536
495
26.779
2.326
504
1.481
31.090
Shrub
986
862
303
334
783
1.027
318
357
2.485
Ab. pasture
707
963
1.843
1.394
530
1.625
2.110
642
4.907
Non-forest
691
497
1.241
9.169
775
1.002
2.048
7.773
11.598
TOTAL
31.456
3.309
3.923
11.392
28.867
5.980
4.980
10.253
50.080
Overall accuracy: 81,8%
Kappa index of agreement: 66,6%
Overall accuracy: 75,3%
Kappa index of agreement: 57,3%
Classification
CLC database
Forest
Everything else
Forest
Everything else
TOTAL
Ref.
data
Forest
29.072
2.018
26.779
4.311
31.090
Everything
else
2.384
16.606
2.088
16.902
18.990
TOTAL
31.456
18.624
28.867
21.213
50.080
Overall accuracy: 91,3%
Kappa index of agreement: 81,3%
Overall accuracy: 87,2%
Kappa index of agreement: 73,5%
Table 4: Comparison of the rule-based classification vs. the CLC database - error matrices and thematic
accuracy assessment