2-D and 3-D. Because low-level feature extraction is error
prone, we try to combine as many cues as possible to achieve
redundancy. Our general assumption is that the complete
roof consists of a set of planar patches that mutually adjoin
along their boundary. Our planar primitive can have an ar-
bitrarily complex polygonal boundary, i.e. we do not require
rectilinear roof shapes. Considering the detection and dis-
counting of disturbances along the roof boundary, such as
chimneys and shadows, the remaining edges should have per-
ceptually uniform color properties. By modeling not only the
geometry of the roof, but also the spectral properties along
its boundary we can handle complex roof shapes.
In the current approach, the user is only asked to provide a
rough location of the houses in one image, the subsequent
3-D reconstruction is fully automatic. The combination of
color and DSMs can provide the positions of the houses, as
well as a rough 3-D description. This strategy focuses on
building reconstruction, however, the concept is general and
can be augmented to also include other man-made objects
such as roads and bridges.
3 DATA SETS USED IN AMOBE
A data set! from Avenches (Switzerland) was acquired for
use in the AMOBE project [Mason et a/. 1994]. The data
set consists of a residential and an industrial scene with the
following characteristics: 1:5,000 image scale, near-vertical
aerial photography, four-way image overlap, color imagery,
geometrically accurate film scanning with 15 microns pixel
size, precise sensor orientation, and accurate ground truth in-
cluding a Digital Terrain Model (DTM) and buildings. The
manually measured CAD models of the buildings are impor-
tant to evaluate our results. In Fig. 5A-C we show the resi-
dential data set, including the digital surface model and the
manually measured CAD models of the houses. The houses
shown in the residential scene are representative for Europe
and in particular for Switzerland. Since false color infrared
images (CIR) were not available for the Avenches data set,
we used an additional data set of an urban area with mostly
detached buildings for these experiments.
4 USE OF DSMS AND COLOR SEGMENTATION
4.1 Use of Digital Surface Models
Digital surface models are a rich source of information for
building detection [Baltsavias et a/. 1995]:
Building position and separation The approximate posi-
tion of buildings can be used to guide 2-D feature extrac-
tion and grouping, spectral classification and image tex-
ture analysis, thereby reducing processing time. Given
the approximate position, the DSMs provide means to
separate buildings from other objects that have similar
low level cues but different DSM characteristics, e.g.
separation of buildings from roads and driveways.
Support in matching DSMs support 3-D feature matching,
e.g. they provide approximations and they can be used
to reduce the number of candidate matches.
Model selection DSMs provide information which allows the
inference of 3-D object hypotheses in model-based build-
ing reconstruction. Depending on the accuracy and res-
olution of the DSM, the following information can be
l The data set can be acquired by ftp from the authors
322
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B3. Vienna 1996
provided: approximate 3-D size and shape, distinction
between flat and non-flat roofs, distinction between one-
peak, ridge, and horizontal roofs, number of major roof
planes, and the distinction between l-, T- L-, U-, and
X-type buildings.
Ortho-images and ortho-rectified stereo pairs DSMs can
be used in the generation of ortho-images and ortho-
rectified stereo pairs, whereby the latter can be used to
detect DSM errors [Baltsavias et al. 1995].
When buildings adjoin each other, which is often the case in
dense urban areas, some of the above DSM usages become
more difficult or almost impossible.
The extracted DSM must have high accuracy and sufficient
density. We have used commercial packages, which employ
area correlation for DSM generation at digital photogram-
metric workstations in grid mode either in image or ob-
ject space. Several blunders close to the building bound-
aries occur, however, the results are still usable. To avoid
loss of buildings with these packages, the DSM should have
a grid spacing of 0.25 - 0.5 m. Such dense grid spacing
is also necessary to distinguish buildings that are close to
each other and to avoid strong smoothing of discontinu-
ities. For the same reasons a small patch size should be
used in area-based matching. Better DSMs can be derived
by use of feature based matching or its combination with
area-based matching [Berthod et al. 1995], by the use of
multi-photo matching with geometric constraints [Grün 1985,
Baltsavias 1991], or from airborne laser scanners.
4.2 3-D Blob Detection
Different methods of extracting 3-D blobs, i.e. possible build-
ings, from a DSM have been investigated. Morphological op-
erators are sensitive to the choice of the structuring element
size, particularly in dense urban areas, and have problems
when other DSM blobs are situated close to the buildings, or
when the terrain is steep and irregular. A subtraction of the
DSM from an existing DTM is simple, but DTMs, if they are
available, do not usually have sufficient density and accuracy.
A sufficient accuracy is essential in order to detect low build-
ings. Edge detectors extract most of buildings outlines but
they do not deliver closed contours. Other structures with a
much smaller height than buildings, such as road borders, are
also detected.
The most promising method consists of grouping the DSM
heights into consecutive bins (height ranges) of a certain size.
It corresponds to cutting equidistant slices through the DSM.
Thus, the DSM is segmented in relatively few regions that are
always closed and easy to extract. The method can be applied
hierarchically using different bin sizes, it is simple and fast,
and can be applied globally or locally. The maximum and
minimum bin sizes are determined from the known height
accuracy of the DSM (e.g. 0.5 - 1 m) and the estimated
minimum building height in the image (e.g. 3 - 4 m). The
hierarchical approach makes use of coarse bins that detect
possible buildings, while the fine bins verify the coarse detec-
tion, provide information for an approximate building model
and separate buildings close to each other. Results of this
method are shown in Fig. 2. For details we refer to [Balt-
savias et al. 1995].
EE d
v
Figure
Surface
tizatioi
the bu
L-shap
disting
4.3 (
Object
for exa
and bi
perforr
of the
by usir
well as
within
local g
Veget:
non-bt
Apart
buildir
based
of the
jected
its ma
togran
90° aj
few ac
histog
on the
4.4
In adc
classif
and ir
man-r
1996].
suitab
(abbr.
color
nents
space:
clidea