The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B4. Beijing 2008
1161
SIj = —
Where, SI,is the shape index; P, is the perimeter; A, is the area
of image object i.
The number of features used in this study is summarized in
Table 1:
Features
The number of features
Mean value
3
Mean difference to scene
3
Mean difference to
neighbour
3
Standard Deviation
3
Area
1
Shape Index
1
Table 1. Features and the number of features used in one-class
SVMs
The values of features vary significantly. For example, the area
of objects ranges from 1 to 10650 while the shape index ranges
from 1 to 8.73. Therefore, before we implemented support
vector machines to classify the objects, we firstly normalized
the features by using a maximum and minim method, and the
final values for each feature range from 0 - 1.
2.2.3 Feature selection: The feature selection aims to select a
subset of the features that could perform well in the
classification algorithms. It could reduce the dimensionality of
the feature space while still maintain high classification
accuracy and avoid over-fitting. In this study, we used the
brute-force search algorithm (also referred to as the exhaustive
search), which searches for all of the possible combinations of
features and finds the optimum solution for the classifier.
Assuming the number of features is n, the number of possible
combination of features will be:
C 1 + C 2 + C 3 + ...+ C n = 2" -1
n n n n
Since there are 14 features in this study, which results in 16383
possible combinations. The computational calculation is
acceptable with current PC (the computer used in this study is
Intel CPU 2.4, and takes approximately 2.6 hours to search all
possible combinations as well as implemented one-class SVMs).
Although the brute-force algorithm is very time-consuming, it
solves the optimization problem directly and always finds the
solution if it exists. However, it should be noted that when the
number of features increase, the brutal search method could
become too time-consuming as its cost is proportional to the
number of candidate features. In these cases, other feature
selection methods can be more appropriate (e.g. the stepwise
method). The Kappa values and cross validation methods are
used to evaluate model performance with difference feature
combinations.
2.2.4 Accuracy assessment: Ground truth data were acquired
by manual photo interpretation. Totally, 100 house objects and
20 non-house objects were selected. For one class support
vector machines, we only used house data for training, and both
house and non-housing data to evaluate the classification results.
We used the Kappa value to measure the classification accuracy,
Kappa values take into account an agreement that can occur by
change (expected agreement). In General, Kappa values of 0.75
and higher are considered good classification results (Eric et al
2004). A five-fold cross-validation method is used to evaluate
the model performance based on the training data. The cross
validation method is implemented as follows (Guo et al. 2007):
(1) Select 100 houses as set H and 20 non houses as set H n as
training set for cross validation.
(2) Randomly split set H into five subsets, i.e., every subset
contains 20 houses. Every subset H in combination with H„ in
turn composes of set T e . Other four subsets in set H compose of
set T r .
(3) The set T r is used as training samples; the set T e is used as
testing samples to evaluate the model performance of one-class
SVMs.
(4) The Kappa value is calculated for each iteration. The
average Kappa value is reported based on five implements.
2.2.5. Implement SVMs for object-based classification
Assuming we have / training points X ; (i=1, 2... /), we want
to find a hypersphere as small as possible to contain the training
points in multidimensional space. Meanwhile, we also allow a
small portion of outliers to exist using a slack variable ():
(1)
Subject to: (x,. - c) T {x, - c) < R 2 + <£, > 0 for all i e [/] (2)
Where c and R are the center and radius of the sphere, T is the
transpose of a matrix, and V G (0, 1] is the trade-off between
the volume of the sphere and the number of training points
rejected. When V is large, the volume of the sphere is small
thus more training points will be rejected than when V is small,
where more training points will be contained within the sphere.
V can be roughly explained as the percentage of outliers in the
training dataset (Scholkopf et al., 2001). This optimization
problem can be solved by the Lagrangian:
L(R, £,c, a,, A ) = R 2 + ■- £ a, {/? 2 + £ - (x,. 2 - 2cx, +c 2 )}
VI /=i ,=i
-£m (3)
/=1
Where a ( . > 0 and > 0. Setting the partial derivative of L
with respect to R, Cl i , c equal to 0, we get:
i
1^=1
i=i
(4)
1
0 < a, < —
(5)
vl
i
C = X a i X i
(6)
Substituting equation (4)-(6) to equation (3), we have the dual
problem:
minZ/j a > a j(*/ x j)~ L a i( x i • */) ( ? )
1 J-,
Subject to: 0 < a, < — , = 1
v/ ,-i
To determine whether or not a test point ( X ) is within the
sphere, we can calculate the distance between the test point and
the center C. It can be expressed as:
(x ■ x) - 2£ a, (x • x,.) + X afljO, •Xj)<R 2 (8)
So far, we have assumed that the data are spherically distributed.
In reality, the data are often not spherically distributed. To
make the method more flexible and to capture the non-linearity
such as multi-mode distribution, the kernel function K(x h Xj) can
be introduced. We can express the inner product in equation 8
as the kernel function: