International Archives of Photogrammetry and Remote Sensing, Vol. 32, Part 7-4-3 W6, Valladolid, Spain, 3-4 June, 1999
53
INCLUSION OF MULTISPECTRAL DATA INTO OBJECT RECOGNITION
1 2
Bea Csathó , Toni Schenk, Dong-Cheon Lee and Sagi Filin
1 Byrd Polar Research Center, OSU, 1090 Carmack Rd., Columbus, OH 43210, email: csatho.l@osu.edu, phone: 1-614-292-6641
2 Department of Civil Engineering, OSU, 2070 Neil Ave., Columbus, OH 43210 email: schenk.2@osu.edu, phone: 1-614-292-7126
KEYWORDS: Data fusion, multisensor, classification, urban mapping, surface reconstruction.
ABSTRACT
In this paper, we describe how object recognition benefits from exploiting multispectral and multisensor datasets. After a brief
introduction we summarize the most important principles of object recognition and multisensor fusion. This serves as the basis for
the proposed architecture of a multisensor object recognition system. It is characterized by multistage fusion, where the different
sensory input data are processed individually and only merged at appropriate levels. The remaining sections describe the major
fusion processes. Rather than providing detailed descriptions, a few examples, obtained from the Ocean City test-data site, have been
chosen to illustrate the processing of the major data streams. The test site comprises of multispectral and aerial imagery, and laser
scanning data.
1. INTRODUCTION
The ultimate goal of digital photogrammetry is the automation
of map making. This entails understanding aerial imagery and
recognizing objects - both hard problems. Despite of the
increased research activities and the remarkable progress that
has been achieved, systems are still far from being operational
and the far-reaching goal of an automatic map making system
remains a dream.
Before an object, e.g. a building, can be measured, it must
first be identified as such. Fully automated systems have been
developed for recognizing certain objects, such as buildings
and roads on monocular aerial imageries, but their
performance largely depends on the complexity of the scene
and other factors (Shufelt, 1999). However, the utilization of
multiple sensory input data, or other ancillary data, such as
DEMs or GIS layers, opens new avenues to approach the
problem. By combining sensors that use different physical
principles and record different properties of the object space,
complementary and redundant information becomes available.
If merged properly, multisensor data may lead to a more
stable and consistent scene description. Active research topics
in object recognition include multi-image techniques using
3D feature extraction, DEM analysis or range images from
laser scanning, map- or GIS-based extraction, color or
multispectral analysis, and/or a combination of these
techniques.
Now the cardinal question is how to exploit the potential
these different data sources offer to tackle object recognition
more effectively. Ideally, proven concepts and methods in
remote sensing, digital photogrammetry and computer vision
should be combined in a synergistic fashion. The combination
may be possible through the use of multisensor data fusion, or
distributed sensing. Data fusion is concerned with the
problem of how to combine data from multiple sensors to
perform inferences that may not be possible from a single
sensor alone (Hall, 1992). In this paper, we propose a unified
framework for object recognition and multisensor data fusion.
We start out with a brief description of the object recognition
paradigm, followed by the discussion of different
architectures for data fusion. We then propose a multisensor
object recognition system. The remaining sections describe
the major fusion processes. Rather than providing detailed
descriptions, a few examples, obtained from the Ocean City
test-data site, have been chosen to illustrate the processing of
the major data streams. Csatho and Schenk (1998) reported
on earlier tests using the same dataset. The paper ends with
conclusions and an outline of future research.
2. BACKGROUND
2.1. Object recognition paradigm
At the heart of the paradigm is the recognition that it is
impossible to bridge the gap between sensory input data and
the desired output. Consider a gray level image as input and a
GIS as the result of object recognition. The computer does not
see an object, e.g., a building. All it has available at the outset
is an array of numbers. On the output side, however, we have
an abstract description of the object, for example, the
coordinates of its boundary. There is no direct mapping
between the two sets of numbers.
A commonly used paradigm begins with preprocessing the
raw sensory input data, followed by feature extraction and
segmentation. Features and regions are perceptually organized
until an object, or parts of an object, emerge from the data.
This data model is then compared with a model of the
physical object. If there is sufficient agreement, the data
model is labeled accordingly. In a first step, the sensor data
usually require some pre-processing. For example, images
may be radiometrically adjusted, oriented and perhaps
normalized. Similarly, raw laser altimeter data are processed
to 3-D points in object space.
The motivation for feature extraction is to capture information
from the processed sensory data that is somehow related to
the objects to be recognized. Edges are a typical example.