3. Istanbul 2004
vely averaging
ach resolution
Despite having
t well suited to
| not produce a
is sensitive to
vercome these
can be applied
apageorgiou ef.
1 conventional
1e width of the
] this distance
1sity transform,
re 2(c)). This
basis functions
let compressed
ions for
and
eee
after
ly new tool for
based on the
nik, 1995) and
? bound on the
n rather than
not subject to
h many neural
S.
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B3. Istanbul 2004
SVMs work by finding a separating hyperplane between two
classes. In a binary classification problem, there could be many
hyperplanes that separate the data. As shown in figure 3, the
optimal hyperplane occurs when the margin between the
classes is maximised. In addition, only a subset of the data
points will be critical in defining the hyperplane. These points
are the support vectors.
Separating Hyperplanes
Support Vectors
Margin
Figure 3. (a) Possible separating hyperplanes; (b)
Optimal separating hyperplane
Another attractive property of the SVM is that its decision
surface depends only on the inner product of the feature
vectors. As a result, the inner product can be replaced by any
symmetric positive-definite kernel (Cristianini & Shawe-
Taylor, 2000). The use of a kernel function means that the
mapping of the data into a higher dimensional feature space
does not need to be determined as part of the solution, enabling
the use of high dimensional space for the learning task without
needing to address the mathematical complexity of such spaces.
This offers the prospect of being able to separate data in high
dimensional feature space and find classifications that were not
possible in simple, lower dimensional spaces (Figure 4).
Figure 4. Mapping data into a higher feature space can
make the data easier to separate
3. TEST DATA
Several datasets exist in the public domain for use in building
extraction. Two of the most commonly used are the Avenches
dataset (Henricsson, 1996) and the Fort Hood dataset (1999).
Another dataset from the town of Zurich Hoengg was added to
the public domain for the Ascona 2001 workshop (Hoengg
dataset, 2001). The small number of buildings in these images
makes these datasets unsuitable as the basis for research using a
learning machine like SVM. As the learning machine is trained
by example, a large number of examples of each object class
must be presented to the learning machine to ensure valid
learning. These public domain datasets simply do not contain
enough data for this purpose.
To generate sufficient training data, a new database of images
was created for the purposes of this research. Several large-
scale aerial photographs of the city of Ballarat, in central
Victoria, were available for this task. The images were acquired
at a scale of 1: 4000, originally for the purpose of asset
mapping within the Ballarat city centre. As such, they were
taken in a standard stereo-mapping configuration, with a near
vertical orientation and a 60% forward overlap.
Three images from this set were scanned from colour
diapositives on a Zeiss Photoscan™ | at a resolution of 15
microns. The resultant ground sample distance for the images
was 6 cm. This compares well to a ground sample distance of
7.5cm for the Avenches dataset and 7 cm for the Zurich
Hoengg data. The Zeiss scanner produces a separate file for
each colour band of the image (red-green-blue (RGB)). These
files are produced in a proprietary format and were converted
into uncompressed three-colour 24-bit Tagged Image Format
File (TIFF) files for ease of use with other systems.
3.1 Image patches
In order to train the classifier and test whether effective class
discrimination was possible, the classification problem was
simplified by producing discrete image patches of a regular
size. Each patch was 256 pixels by 256 pixels and contained
either a single building or non-building piece of the image. The
recognition problem was simplified further by limiting the
building patches to those containing single, detached residential
houses, where the extent of the house fitted completely within
the 256 x 256 pixel area. This may seem extremely restrictive
but the problem of building extraction has proven to be very
difficult and a generalised solution appears unlikely at this
stage. In a classification approach, it is likely that there will be
a class for each category or type of building i.e. residential
detached, residential semi-detached, commercial, industrial and
so on. As this area appears largely unexplored, the scope of the
classification was limited to a very specific case to increase the
chances of success.
The aerial image TIFF files were used to create a collection of
image patches, where each patch was stored in a separate TIFF
file. As the area is predominantly urban residential in character,
many of the non-building image patches contained residential
street detail, usually kerb and channel bitumen roadways
(Figure 5).
4. INITIAL TESTS
Initial classification tests were based on a balanced test set of
100 building images and 100 non-building images. Image
coefficients were extracted using the quadruple sampled
wavelet process described earlier. A public domain support
vector machine, SVM" (Joachims, 1998) was used to classify
the image patches into building or non-building categories.
Results of these tests have been reported previously (Bellman
& Shortis, 2002) and showed that although the classification
had a predicted success rate of 73%, the actual success on a
small independent test set was only 40%. As the success rate is
strongly dependent on the sample size, the low rate of detection
is most probably due to the small size of the training set.
Although preprocessing is done using the wavelet transform,
there are many variables that can influence the preprocessing
stage. In studies by others, some attempt had been made to
identify the optimum set of parameters but it was found that