ISPRS Commission II, Vol.34, Part 3A „Photogrammetric Computer Vision‘, Graz, 2002
A MACHINE LEARNING APPROACH TO BUILDING RECOGNITION
IN AERIAL PHOTOGRAPHS
C.J. Bellman""
, M.R. Shortis?
‘Department of Geospatial Science, RMIT University, Melbourne 3000, Australia - Chris.Bellman@rmit.edu.au
“Department of Geomatics, University of Melbourne, Parkville 3052, Australia - M.Shortis@unimelb.edu.au
Commission III, Working Group III/4
KEY WORDS: Building detection, Learning, Classification, Multiresolution
ABSTRACT:
Object recognition and extraction have been of considerable research interest in digital photogrammetry for many years. As a result,
many conventional tasks have been successfully automated but, despite some advances, the automatic extraction of buildings remains
an open research question. Machine learning techniques have received little attention from the photogrammetric community in their
search for methods of object extraction. While these techniques cannot provide all the answers, they do offer some potential benefits
in the early stages of visual processing. This paper presents the results of an investigation into the use of machine learning in the form
of a support vector machine. The images are characterized using wavelet analysis to provide multi-resolution data for the machine
learning phase.
l. INTRODUCTION
The advent of digital imagery has resulted in the automation of
many traditional photogrammetric tasks.
However, the automatic extraction of man-made features such
as building and roads is far from solved. These objects are
attractive for automatic extraction, as they have distinct
characteristics such as parallelism and orthogonality that can be
used in the processing of symbolic image descriptions. Despite
an extensive research effort, the problem remains poorly
understood (Schenk, 2000).
Object extraction from digital images consists of two main
tasks:
e identification of a feature, which involves image
interpretation and feature classification and,
e tracking the feature precisely by determining its
outline or centreline.
(Agouris et. al., 1998)
Although many algorithms have been developed, none could
claim to be fully automated. Most rely on some form of operator
guidance to determine areas of interest or providing seed points
on features.
This paper addresses the issue of determining areas of interest
(candidate patches) using machine learning techniques.
Most photogrammetric applications for building recognition
have followed the principle, established by Marr (Marr, 1982),
that there are three levels of visual information processing. The
first, low-level processing, involves the extraction of features in
the image such as edges, points, and blobs that appear as some
form of discontinuity in the image.
Intermediate-level processing involves the grouping and
connection of these image primitives based on some measure of
similarity or geometry. This forms the primal sketch (Marr,
* Corresponding author
1982) and is the basis for testing object hypotheses against rules
that describe object characteristics. Many approaches are
possible for establishing these rules such as semantic modelling
(Stilla & Michaelsen, 1997), similarity measures (Henricsson,
1996), perceptual organisation (Sarkar & Boyer, 1993) or
topology (Gruen & Dan, 1997).
High-level processing usually involves extracting information
associated with an object that is not directly apparent in the
image (Ullman, 1996,pg 4). This could be determining what the
object is (recognition), or establishing its exact shape and size
(reconstruction). In computer vision, recognition is the most
common problem pursued. In photogrammetry, reconstruction
of the geometry of features is more typically required.
1.1 Candidate regions
Despite the advances that have occurred in automated object
extraction, most photogrammetric applications require some
form of operator assistance to establish candidate image regions
for potential object extraction. This is usually necessary to
reduce the search space and make the problem tractable. Low-
level processing strategies such as edge detection create a large
number of artefacts that the mid-level grouping strategies find
difficult to resolve.
This problem cannot be solved simply by segmentation, as this
is difficult for an aerial image (Nevatia et. al., 1998). An image
contains many objects, only some of which should be modelled.
The objects of interest may be partially occluded, poorly
illuminated or have significant variations in texture.
In the case of building extraction, Henricsson (1996) solves the
candidate problem in a pragmatic way. Rather than finding
candidate regions using a computational process, the operator
identifies candidate regions of the same building in multiple
images. The computer system then extracts the edge features,
groups these based on several measures of similarity and
computes a 3-dimensional reconstruction of the building.
Gulct
Extra
over ¢
image
acqui
systel
seed
extrac
In so!
provi
an in
used
infor
the p
the re
does
The
(Zimi
(Sche
of int
to the
Ther:
proce
in m:
lengt
our L
proce
Macl
netw
for i
appli
phot
appli
acqu
landı
been
stere
(Isra
The
class
basis
smal
empl
node
imag
coml
stage
dom:
such
(Gro
1998
Wav
(Rab
char:
resol
Such
reco,