; a result,
; remains
y in their
| benefits
the form
machine
inst rules
ches are
10delling
nricsson,
1993) or
ormation
nt in the
what the
and size
the most
struction
d object
re some
> regions
ssary to
le. Low-
e a large
gies find
1, as this
un image
10delled.
, poorly
jlves the
finding
operator
multiple
features,
rity and
y
>.
ISPRS Commission III, Vol.34, Part 3A ,,Photogrammetric Computer Vision", Graz, 2002
Gulch et. al. (1998) describe a Semi-automatic Building
Extraction System that has undergone extensive development
over a number of years. In this system, an operator interprets the
image contents and automated tools assist the operator in the
acquisition of 3-D shape data describing a building. In another
system (Michel et. al., 1998), the operator need only provide a
seed point within the building roof-line. The building is then
extracted automatically using a pair of epipolar images.
In some situations, spatial information systems can be used to
provide existing semantic and positional data about objects in
an image (Agouris ef. al, 1998). A set of fuzzy operators is
used to select the relevant data and control the flow of
information from image to spatial database. The system offers
the potential of fully automatic updating of spatial database but
the relies on the existence of the database in the first place. It
does not use image data to determine regions of interest.
The use of auxiliary data such as digital surface models
(Zimmermann, 2000), multi-sensor and multi-spectral data
(Schenk, 2000), provides another means of determining regions
of interest in an image but issues of data fusion add complexity
to the task.
There is much evidence from cognitive science that human
processes for shape recognition are both rapid and approximate
in many cases. Intuitively, this suggests that complicated and
lengthy visual processing strategies are not complete models of
our biological vision, particularly in the early stages of visual
processing.
2. A MACHINE LEARNING APPROACH
Machine learning approaches, such as those based on neural
networks and support vector machines, are popular strategies
for image analysis and object recognition in many imaging
applications (Osuna ef. aL, 1997; Li et. aL, 1998). In
photogrammetry, machine learning techniques have been
applied to road extraction (Sing and Sowmya 1998), knowledge
acquisition for building extraction (Englert 1998) and for
landuse classification (Sester 1992). Neural techniques have
been used in feature extraction (Li et al. 1998, Zhang 1996),
stereo matching (Loung and Tan 1992) and image classification
(Israel and Kasabov 1997).
The recognition task is generally treated as a problem of
classification, with the correct classifications being learnt on the
basis of a number training examples. Where the images are
small (i.e. have few pixels), a direct connection approach is
employed, where each image pixel is directly connected to a
node in the connectionist architecture. For typical aerial digital
imagery, such an approach is not feasible due to the
combinatorial explosion that would result. Some preprocessing
stage is required to extract key characteristics from the image
domain. Many of the strategies for preprocessing are available,
such as edge detection (Canny, 1986), log-polar-forms
(Grossberg, 1988) and texture segmentation (Lee & Schenk,
1998).
Wavelet analysis is often associated with image compression
(Rabbani & Joshi, 2002) but also has useful properties for the
characterization of images. Of particular interest are the multi-
resolution representations that can be generated (Mallat, 1989).
Such an approach has been used successfully in system to
recognize the presence of a pedestrian in a video image
(Papageorgiou et. al., 1998); (Poggio & Shelton, 1999) and for
face recognition (Osuna et. al, 1997). There are strong
suggestions from psycho-physical experiments that mammalian
vision systems incorporate many of the characteristics of
wavelet transforms (Field, 1994).
2.1 Wavelet Processing
Wavelet processing allows a signal to be described by its overall
shape plus a range of details from coarse to fine (Stollnitz er.
al., 1995). In the case of image data, wavelets provide an
elegant means of describing the image content at varying levels
of resolution.
The Haar wavelet is the simplest of the wavelet functions. It is a
step function in the range of 0-1 where the wavelet function
W(x) is expressed as:
1 forüs x«1/
Ww(x):=4—1 for12<x<1 (1)
0 otherwise
The wavelet transform is computed by recursively averaging
and differencing the wavelet coefficients at each resolution. An
excellent practical illustration of the use of wavelets is provided
by Stollnitz et. a/.(1995).
As a discrete wavelet transform (DWT), the Haar basis does not
produce a dense representation of the image and is not
sufficiently sensitive to translations of the image content. An
extension of the Haar wavelet can be applied that introduces a
quadruple density transform (Papageorgiou ef. al., 1998; Poggio
& Shelton, 1999). In a conventional application of the discrete
wavelet transform, the width of the support for the wavelet at
level n is 2" and adjacent wavelets are separated by this
distance. In the quadruple density transform, this separation is
reduced to '4 2" (Figure 1(c)). This oversamples the image to
create a rich set of basis functions that can be used to define
object patterns. An efficient method of computing the transform
is given in Oren et. al., (1999).
| 1
-1 1 1 1
-1 (b) 2D Wavelet functions for
horizontal, vertical and
diagonal features
(a) Haar wavelet
from equation 1
9
Y A42 —N
Standard
Over-sampled
(c) Sampling methods
Figure 1: The Haar wavelet characteristics
(after (Papageorgiou et. al., 1998)).