IX-B4, 2012
"he SVM classifier
classification and
s, each marked as
ng builds a model
0 one class or the
erplane in feature
ning sample from
ong to the class of
e most classes are
ire space mapping
pped into another
ansformed feature
Both training and
utation of inner
and x; are feature
e space and ®(x;)
inner products can
ich means that the
explicitly applied
' Gaussian Kernel
of SVM has been
lata to avoid over-
prresponds to the
to be outliers.
e two classes, and
blem. A common
ersus-one-strategy
is of classes C are
combinations are
vith the most wins
eds to learn the
ot only the classes
ribe a typical site,
e training is done
labels. The image
ie training objects
e is normalised so
eature vectors are
he one-versus-one
cess is a labeled
or each pixel.
| results to GIS-
l, the result for all
1 GIS-object. This
2004). Pixels that
dered as correct,
considered to be
ssment of the GIS
in relation to all
y
2
(4)
‘,, the GIS object
q@
*shold t, depends
ear in a cropland
less equally in the
xtured regions, a
f an error region
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B4, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
is necessary. A compact error is an area of connected pixels
belonging to the same class (which differs from the class label
in the GIS data set) with a width larger than a threshold #, and
an area larger than a threshold #4. The width of an assumed
compact error is determined applying a morphologic filter
(erosion) and counting the steps till the assumed compact error
disappears. A GIS object with a compact error is labelled as
rejected/incorrect and has to be reviewed by a human operator.
4. EVALUATION
4.1 Data
To evaluate our approach we have used the European CORINE
Land Cover GIS database (CLC) and three multi-temporal
images taken in one year covering a test site of 329 km? in
Halberstadt, Germany. For the evaluation, a reference dataset
was available. The reference dataset was produced using visual
interpretation of the images.
4.1.1. Image Data Images are available from two different
sensors, namely RapidEye and DMC (Disaster Monitoring
Constellation, operated by DMC International Imaging
(DMCii)). The images were acquired within a 4 month period
during the summer months. The RapidEye image was acquired
on August 20, 2009 and has a resolution of 5m. The five bands
of this sensor are blue, green, red, red edge and near infrared. In
addition two DMC images are used, acquired on April-24, 2009
and on August-24, 2009. The DMC sensor has a resolution of
32m and captures three bands (green, red and near infrared).
The dimension of N,, is 11 (5 + 2x3). For textural information,
the resolution was subsampled by a factor of two to cover
relevant information. Hence, the resulting dimension of the
feature vector for one pixel position is 440. The neighbourhood
is N, = 11 pixels for all scenes to cover a relevant area. All
images are orthorectified.
Before processing all images in one workflow the 32m DMC
images are clipped to the same size and resampled to the same
resolution as the RapidEye image. For the resampling we use a
nearest neighbor interpolation, because radiometric information
remains unaltered (Albertz, 2001).
4.1.2. GIS database The European CLC data set is managed
and coordinated by the European Environment Agency (EEA,
2011), assisted by the European Topic Center for Land Use and
Spatial Information (ETC-LUSI). In Germany the UBA
(Umweltbundesamt — Federal Environmental Agency) is the
national reference center. It acts as the contact point for the
EEA and is responsible for the management and coordination of
CLC. The data model was defined to be compliant with a scale
of 1:100,000; the minimum mapping unit is 25 ha for new
polygons and 5 ha for changes of existing polygons. The CLC
data set has been produced with respect to reference years 1990,
2000 and 2006 using mainly images of Landsat, SPOT and IRS
satellites. Even though the minimum mapping unit is 25 ha,
GIS-objects with an area smaller than 25 ha appear in the data
set of our test site. GIS-objects smaller than 1 ha were not
processed with our approach, because a reliable classification of
small GIS-objects using DMC images with a resolution of 32 m
is not possible.
The main land cover class in our test site is cropland. Out of
425 km? with 3072 GIS cropland and grassland objects, 1316
cropland GIS-objects covering 367 km? with an average size of
27.9 ha, and 1756 grassland GIS-objects covering 58 km? with
an average size of 3.3 ha can be found.
67
4.2 Evaluation assessment
Confusion matrices are a common tool for quality assessment.
For the verification a special confusion matrix is used which
compares the verification result (accepted/rejected GIS-objects)
with a reference (correct/false GIS-objects). Such a confusion
matrix is visualised in Figure 1.
System
Reference Accepted
Accepted True Positive (TP) | False Negative (FN)
False Positive (FP)
(undetected errors)
True Negative (TN)
(detected errors)
Figure 1: Confusion matrix of diagnostics.
Based on this confusion matrix, measures for the evaluation can
be derived, e.g. the thematic accuracy. The goal is to increase
the thematic accuracy. The thematic accuracy before the
verification process is TA a priori with
TA a priori = (TP + FN)/(TP + FN + FP + TN) x 100% (4)
The aim is to achieve a thematic accuracy after the verification
process TA a posteriori with
TA a posteriori = TA a priori + TN/(TP + FN + FP + TN) x
100% (5)
whereas TA a posteriori has to be at least 95%. At the same
time the human operator should save time compared to a
completely manual quality assessment of the GIS data set. A
measure which represents this goal is the time efficiency with
time efficiency =(TP + FP)/(TP + FN + FP + TN) x 100% (6)
which is equal to the percentage of GIS-objects which do not
have to be reviewed by a human operator. The time efficiency
should be at least 50%. The defined requirements are based
on experiences gained from the practical application of
quality assessment of GIS data sets (BKG, 2009).
4.3 Parameter settings
Only a small number of parameters have to be set to run our
approach. Most of them can be trained automatically, others are
defined by the characteristics of the used GIS and only a few of
the parameters have to be set to empirical values.
The fact that the goal of our approach is the verification of a
GIS data set influences the strategy of the classification process.
The parameters of our method have to be optimised in order to
achieve a good verification, but not necessarily a good
classification result. For instance, a classification error which
leads to an undetected error remaining in the GIS data set is
penalised higher than classification errors which lead “only” to
a false negative.
There are no parameters to be set for the calculation of the
spectral features. Parameters for the feature extraction of the
textural features are distance A and direction « for the
determination of the GLCM (Haralick et al, 1973). The
parameters were set to the standard values 4 = 1 and a = 0°,
45°, 90°, 135°. By using fixed parameters for A and @ the
textural features are only representative for these chosen
parameters. By using four different directions for « the textural
features are rotation invariant. Therefore, the dependency from
the parameter a could be eliminated as far as possible. In
contrast, the dependency from parameter 4 could not been
solved, so pattern which are not in the range of A are not be
taken into account.