International Archives of Photogrammetry and Remote Sensing, Vol. 32, Part 7-4-3 W6, Valladolid, Spain, 3-4 June, 1999
Edges are discontinuities in the gray levels of an image.
Except for noise or systematic sensor errors, edges are caused
by events in the object space. Examples of such events
include physical boundaries of objects, shadows, and
variations in the reflectance of material. It follows that edges
are useful features, as they often convey information about
objects in one way or another.
Segmentation is another useful step in extracting information
about objects. Segmentation entails grouping pixels that share
similar characteristics. Unfortunately, this is a quite vague
definition and not surprisingly often defined by the
application.
The output of the first stage is already a bit more abstract than
the sensory input data. We see a transition from signals to
symbols, however primitive they may still be. These primitive
symbols are now subject of a grouping process that attempts
to perceptually organize them. Organization is one of the first
steps in perception. The goal of grouping is to find and
combine those symbols that relate to the same object. Again,
the governing grouping principles may be application
dependent.
The next step in model-based object recognition consists of
comparing the extracted and grouped features (data model)
with a model of the real object (object model), a process
called matching. If there is sufficient agreement, then the data
model is labeled with the object and undergoes a validation
procedure. Crucial in the matching step is the object model
and the representation compatibility between the data and
object model. It is fruitless to describe an object by properties
that cannot be extracted from the sensor data. Take color, for
example, and the case of a roof. If only monochromatic
imagery is available then we cannot use ‘red’ in the roof
description.
The sequential way on how the paradigm is presented is often
called bottom-up or data driven. A model driven or top-down
approach follows the opposite direction. Here, domain
specific knowledge would trigger expectations, where objects
may occur in the data. In practice, both approaches are
combined.
2.2. Multisensor fusion
Multisensor integration means the synergistic use of the
information provided by multiple sensory devices to assist the
accomplishment of a task by a system. The literature on
multisensor integration in computer vision and machine
intelligence is substantial. For an extensive review, we refer
the interested reader to Abidi and Gonzalez (1992), or Hall
(1992).
At the heart of multisensor integration lies multisensor fusion.
Multisensor fusion refers to any stage of the integration
process where information from different sensors is combined
(fused) into one representation form. Hence, multisensor
fusion can take place at the signal, pixel, feature, or symbol
level of representation. Most sensors typically used in practice
provide data that can be fused at one or more of these levels.
Signal-level fusion refers to the combination of signals from
different sensors with the objective of providing a new signal
that is usually of the same form but of better quality. In pixel-
level fusion, a new image is formed through the combination
of multiple images to increase the information content
associated with each pixel. Feature-level fusion helps making
feature extraction more robust and creating composite
features from different signals and images. Symbol-level
fusion allows the information from multiple sensors to be
used together at the highest level of abstraction.
Like in object recognition, identity fusion begins with the
preprocessing of the raw sensory data, followed by feature
extraction. Having extracted the features or feature vectors,
identity declaration is performed by statistical pattern
recognition techniques, or geometric models. The identity
declarations must be partitioned into groups that represent
observations belonging to the same observed entity. This
partitioning - known as association - is analogous to the
process of matching data models with object models in model
based object recognition. Finally, identity fusion algorithms,
such as feature-based inference techniques, cognitive-based
models, or physical modeling are used to obtain a joint
declaration of identity. Alternatively, fusion can occur at the
raw data level or at the feature level. Examples for the
different fusion types include pixel labeling from raw data
vectors (fusion at data or pixel level), segmenting surfaces
from fused edges extracted from aerial imagery and combined
with laser measurements (feature level fusion), and
recognizing buildings by using ‘building candidate’ objects
from different sensory data (decision level fusion).
Pixel level fusion is only recommended for images with
similar exterior orientation, similar spatial, spectral and
temporal resolution, and capturing the same or similar
physical phenomena. Often, these requirements are not
satisfied. Such is the case when images record information
from very different regions of the EM spectrum (e.g., visible
and thermal), or if they were collected from different
platforms, or else have significantly different sensor geometry
and associated error models. In these instances, preference
should be given to the individual segmentation of images,
with feature or decision level fusion. Yet another
consideration for fusion is related to the physical phenomena
in object space. Depending on the level of grouping, extracted
features convey information that can be related to physical
phenomena in the object space. Obviously, features extracted
from different sensors should be fused when they have been
caused by the same physical property. Generally, the further
the spectral bands are apart, the lesser the features extracted
from them are caused by the same physical phenomena. On
the other hand, as the level of abstraction increases, more and