ISPRS Commission III, Vol.34, Part 3A „Photogrammetric Computer Vision“, Graz, 2002
(a) Building image patch
(b) Wavelet decomposition to level 4 of
building patch in (a)
Figure 2. (a) Sample image and (b) corresponding
wavelet representation (not over-sampled)
3. SUPPORT VECTOR MACHINES
The Support Vector Machine (SVM) is based on the principles
of structural risk minimization (Vapnik, 1995). It has the
attractive property that it minimizes a bound on the
generalisation error and is therefore not subject to problems of
local minima that may occur with other classifiers such as
multilayer perceptrons (MLP).
Another property of the SVM is that its decision surface
depends only on the inner product of the feature vectors. As a
result, the inner product can be replaced by any symmetric
positive-definite kernel (Cristianini & Shawe-Taylor, 2000).
The use of a kernel function means that the mapping of the data
into a higher dimensional feature space does not need to be
determined as part of the solution, enabling the use of high
dimensional space without addressing the mathematical
complexity of such spaces. SVM's have been used successfully
in applications for face detection (Osuna et. al., 1997), character
recognition (Schólkopf, 1997; Boser et. al, 1992) and
pedestrian detection (Papageorgiou et. al., 1998).
4 TEST DATA
A set of classification test data, in the form of square image
patches, was extracted from three colour digitized aerial
photographs. The photographs were originally acquired for a
project over the city of Ballarat in Victoria. The photographs
were recorded at a scale of 1:4000 and had been scanned on a
photogrammetric scanner at a resolution of 15 microns. Each
image patch was 256 by 256 pixels and contained either a single
building or a non-building area of the image. Although some
care was taken to centre the building within the image patch, the
exact location of the building in the image patch varied. The
orientation of the building within the image patches also varied.
This lead to a broader representation of the building class than if
the buildings were carefully aligned in each image patch but
created a more difficult classification problem.
The classification test was based on a balanced test set of 100
building images and 100 non-building images. Image
coefficients were extracted using the wavelet process described
in 2.1 above. A public domain Support Vector Machine
(Joachims, 1998) was used to classify the image patches into
building or non-building categories.
5. RESULTS
The image patches used to train the SVM classifier using
several different kernels including polynomial and sigmoidal
kernels. However, the best results were obtained with a simple
linear kernel with no bias. Of the two hundred image patches,
only one patch was classified incorrectly. This was an image of
a large swimming pool that was classified as a building.
Although this result appeared to be very good, the confidence
measures produced by the SVM training suggested a reliability
of only 55%. This could be due to overfitting of the decision
surface to the data. However, the reliability measures produced
by the SVM are also known to be pessimistic (Joachims, 1998),
due to the unbounded nature of the problem. To see if that was
the case here, an extensive leave-one-out test was undertaken.
This produced a revised reliability measure of 73%. Although
the reliability estimate improved, this indicates there may still
be some overfitting of the data. To test this further, 20 building
image patches were withheld from the training data and the
SVM was re-trained. The withheld patches were then classified
by the new SVM. Of the 20 building patches, only 8 were
classified as buildings. This result is similar to the original
reliability estimate. In this case, the decision surface of the
SVM is unlikely to generalize well to a broader set of data. This
could be due to the small size of the training set. Further work is
required to expand the size and scope of the training set to
determine if a more generalized decision surface can be
established.
5.1 Other Considerations
In applying multi-resolution analysis, an appropriate set of
resolutions must be chosen for the task at hand. In this case, a
choice must be made between minimizing the amount of data
that is fed to the classifier and retaining enough information
about the original image that a sensible classification is
possible. In this example, wavelets with supports of 16 and 32
pixels were used, resulting in image coefficients of 16 x 16 and
8 x 8 for each of the horizontal, vertical and diagonal wavelet
functions.
Other resolutions were tried but at higher resolutions, the
number of coefficients expands rapidly and there was no
significant gain in the accuracy of the classification. At lower
resolutions, too much information about the image was lost to
enable definite classification.
One limitation of this implementation is the size of the
coefficient data set produced from the wavelet transform. As the
wavelet transform is over-sampled, each image patch generates
960 coefficients. The current algorithm makes no attempt to
optimize the storage of these coefficients.
Mac
seve
has |
appl
metr
usefi
auto
The
as it
prod
reca
prob
feat
An i
is t
imag
quit:
Sugg
man
Alth
mac
to ic
Ass
patc
the
Ago
Sup
Geo
Pho
XX
Bos
algo
ACI
Pitts
Can
Det
Inte
Cris
Sup,
met
Fiel
Con
Gro
Mec
the
Ext
Ima
Bas
Her
attri
Der
Fed