magery and
ss that has
of multiple
al principles
. If merged
sues related -
and fusion
integration
-anner, and
he purpose
pefully will
with multi-
out with a
n, followed
mote sens-
ite the po-
ces for the
ly. A data
sing tech-
iat need to
that it is
t data and
evel image
ition. The
All it has
the output
the object,
here is no
ng the raw
\ and seg-
organized
à model is
t. If there
ccordingly.
ne prepro-
rically ad-
r altimeter
e informa-
ow related
to the objects to be recognized. Edges are a typical example.
Except for noise or systematic sensor errors, edges are caused
by events in the object space. Examples of such events in-
clude physical boundaries of objects, shadows, and variations
in reflectance of material. It follows that edges are useful fea-
tures as they often convey information about objects in one
way or another.
Segmentation is another useful step in extracting informa-
tion about objects. Segmentation entails grouping pixels to-
gether that share similar characteristics. Unfortunately this is
quite a vague a definition and not surprisingly often defined
by the application—a contradiction to the paradigm that re-
quires the first stages of object recognition to be application
independent—only guided by general principles.
The output of the first stage is already a bit more abstract
than the sensory input data. The result are tokens or sym-
bols. We see a transition from signals to symbols, however
primitive they still may be. These primitive symbols are now
subject of a grouping process that attempts to perceptually
organize them. Organization is one of the first steps in per-
ception. The goal of grouping is to find and combine those
symbols that relate to the same object. The governing group-
ing principles may be application dependent.
The next step in model-based object recognition consists of
comparing the extracted and grouped features (data model)
with a model of the real object (object model), a process
called matching. If there is sufficient agreement then the data
model is labeled with the object and undergoes a validation
procedure. Crucial in the matching step is the object model
and the representational compatibility between the data and
object model. It is fruitless to describe an object by prop-
erties that cannot be extracted from the sensor data. Take
color, for example, and the case of a roof. If only monochro-
matic imagery is available then we cannot use ‘red’ in the
roof description.
The sequential way on how the paradigm is presented is often
called bottom-up or data driven. A model driven or top-down
approach follows the opposite direction. Here, domain spe-
cific knowledge would trigger expectations where objects may
occur in the data. In reality, both approaches are combined.
3 Background on data fusion in remote sensing
Remote sensing usually deals with multispectral and often
multisensor data sets with different spatial, spectral and
temporal resolution. In addition to the sensory data (e.g,
reflectance, brightness temperature), auxiliary information,
such as surface topography, or site specific information from
a GIS is often used during image analysis. Many problems
in remote sensing are solved by using early vision processes
including image enhancement, dimensionality reduction, and
pattern recognition. Late vision processes and the analysis of
more complex image elements, such as size, shape, pattern,
shadow, are rarely used.
Multisensor integration means the synergistic use of the in-
formation provided by multiple sensory devices to assist the
solution of a visual task. The literature on multisensor inte-
gration in computer vision and machine intelligence is sub-
stantial. For an extensive review we refer the interested reader
to Abidi and Gonzalez (1992). An important step of the mul-
tisensor integration process is multisensor fusion. It refers to
any stage of the multisensor integration process where there
is an actual combination (or fusion) of different sources of
sensory information into one representational format. Multi-
sensor fusion can take place at either the signal, pixel, feature,
or symbol level of representation. Most of the sensors typi-
cally used in practice provide data that can be fused at one
or more of these levels. Signal-level fusion refers to the com-
bination of signals from different sensors with the objective
of providing a new signal that is usually of the same form as
the original signals but of better quality. In pixel-level fusion
a new image is formed through the combination of multiple
images to increase the information content associated with
each pixel. Feature-level fusion can be used to make the fea-
ture extraction more robust and to create composite features
from different signals and images. Symbol-level fusion allows
the information from multiple sensors to be used together at
the highest level of abstraction.
In remote sensing the data are mostly fused on the pixel level. :
If additional data, such as DEM or GIS layers are included in
the fusion process, they are first converted into raster images.
Then fused images are created either through pixel-by-pixel
fusion (e.g., pixel based classification) or through the fusion
of associated local neighborhoods of pixels in each of the
component images (e.g., contextual classification).
Most of data fusion methods used in remote sensing belong
to multidata segmentation. Many methods are based on
statistical approaches (Schistad Solberg et al., 1996; Lee et
al., 1987), neural networks (Hepner et al., 1990), Dempster-
Shafer theory (Lee at al., 1987, Le Hegarat-Mascle, 1997),
fuzzy logic, or a combination of methods, such as hybrid
statistical/neural method (Benediktsson et al., 1997). The
spatial contextual behaviour of the gray values or the class
labels is usually characterized by Markov random field models
(e.g., Schistad Solberg at al., 1996).
Most researchers favor supervised classification, probably be-
cause unsupervised classification usually requires a greater
number of classes. The reduced degree of automation and
the tedious and time consuming training process are clear dis-
advantages of supervised classification. Another drawback is
the inability to separate a priori unknown classes and to es-
timate their characteristics. These problems can be avoided
by developing unsupervised classification schemes as shown
by Le Hegarat-Mascle and others (1997).
Many other techniques are used for pixel-based fusion of
multi- and hyperspectral data sets, including simple ones,
such as the well-known principal component analysis or com-
puting ratios, and more complex ones, such as end-member
mixing, matching to library spectra, etc. The review article
of Cloutis (1996) provides what we believe the most com-
plete list of these non-segmentation based methods. A widely
used approach to hyperspectral classification is to model the
mixed-pixel vector as a linear superposition of substances res-
ident in a pixel with additive Gaussian noise. Based on this
linear mixture model, different materials within the image
pixels can be determined by using other techniques, such as
linear unmixing (Adams et al., 1986) or orthogonal subspace
projection (Harsanyi and Chang, 1994).
4 Data set
To illustrate the different levels and techniques for multisen-
sor fusion, a multisensor data set collected over the coastal
areas of Maryland on April 25 and 30, 1997, is used. The data
set includes panchromatic aerial photography, multispectral
Intemational Archives of Photogrammetry and Remote Sensing. Vol. XXXII, Part 7, Budapest, 1998 337