Figure 5: Result of ISODATA clustering of the multispectral
data set (Bands 3-11). Classes: Open water and shadow
(black); Trees and shrubs (dark gray); Lawn and grass
(medium gray); Man-made objects 1, roads, driveways and
roofs, also bare soil (light gray); Man-made objects 2, drive-
ways and roofs, also bare soil (white)
cluded. If we compare classification with feature extraction
then we realize that the perceptual grouping, so necessary in
object recognition for reaching a more explicit and symbolic
scene description, is missing. A closer examination with the
object recognition paradigm also reveals that classification
essentially ends with labeling pixels. There is no reasoning
process that would attempt to formulate and evaluate hy-
pothesis.
There is another reason for considering non-pixel based meth-
ods. Pixel level fusion is only recommended for images taken
by similar exterior orientation, possessing similar spatial, spec-
tral and temporal resolution and capturing the same or similar
physical phenomena (Abidi and Gonzalez, 1992). These re-
quirements are often not satisfied. Maybe the images were
captured in very different regions of the EM spectra (e.g.,
visible and thermal), or they were collected on different plat-
forms, or they may have significantly different error models.
In these cases preference should be given to the individual
segmentation of images, with feature extraction and combi-
nation on higher levels.
A promising method for automation is spectral unmixing.
The effect of mixed pixels, shade and shadow causes too many
distinct spectral shapes in natural scenes, even though only
a few materials are present. This is a serious challenge for
the different classification algorithms. Some studies indicate
that for simple land cover types only a few end-members are
required to fully characterize a scene. That is, each spectra
in the image can be interpreted as a mixture of the spectra
of the end-members (Cloutis, 1996). If this assumption is
valid, the abundance of pure land cover types within each
pixel can be determined by using spectral unmixing methods.
One particular advantage of the mixture model that shade,
shadow and secondary illumination can be treated as end-
member, thus the effects of topography and and illumination
on all scales can isolated (Adams et. al, 1986). The question
here is how valid these approximations are for urban scenes
and how one can automate (and optimize) the ill-posed end-
member selection and make it faster.
6 Conclusions
Object recognition of urban scenes is an utterly ill-posed prob-
lem. Researchers come to realize that utilizing multisensor
and multispectral data sources greatly increase the chances
to make the recognition process more stable. As the avail-
ability and performance of airborne sensors rapidly increase
and at the same time the cost of such systems decrease, mul-
tisensor data acquisition is commercially feasible.
The experiments described in this paper clearly demonstrate
that a much richer set of features can be extracted from
multisensor and multispectral data that eventually leads into
a more unique data model. At the same time, the object
model can include properties that are encoded in the sen-
sory input. Consequently, matching, that is, the comparison
between data and object model, becomes more stable.
In addition to the recommendations put forward in the pre-
vious section, a number of issues are identified that deserve
further attention. The prevailing question is how to combine
information extracted from the different sensors. In other
words, how do we take best advantage of the input data and
how do we optimize the synergistic effect that the combi-
nation (fusion) offers? General guidelines, such as when to
fuse on what level, are not detailed enough. Additional con-
siderations may help to clarify the issue. Let us take laser
altimeter and visual imagery, for example. Both data sources
carry important information about the surface. In fact, we
can reconstruct the surface from stereo and from range data
and compare the result. Since surface reconstruction is an
ill-posed problem, individual processes, such as stereo, shape
from shading, or rangeing, are unstable processes. A combi-
nation of the different input data and simultaneous processing
makes surface reconstruction more stable, however. As a con-
sequence, we ought to revise existing approaches to include
different data sources. Figure ?? shows the DEM obtained
from laser altimeter data. It is remarkably detailed but still
needs to be refined with information from stereo.
Figure 6: DEM of the central area in Figure 1-5. The
higher elevations are rendered with darker tones, and the dark
patches are indicating objects that are higher than their envi-
ronment (e.g., buildings (B), trees and bushes (3), cars (1))
Another consideration for fusion is related to the physical phe-
340 International Archives of Photogrammetry and Remote Sensing. Vol. XXXII, Part 7, Budapest, 1998
noi
ext
ph:
ext
ha
the
tur
anc
exp
As
cep
is fl
199
be
con
cau:
sens
and
sens
The
and
taini
We
and
Abid
Rob:
San
Adar
tral
at th
Vol.
Bene
Hybr
on G
833-¢
Chav
matic
neerii
Clout
Ing: «
terna
2215-
Csath
sensot
proce:
Harsa
age cl
nal su
Remo
Le Hé
Applic
pervise