resolution
ase to an
' and costs
ypically in
supervised
tral scene
en, pixels
features is
>s, and the
extracted.
| by a d-
ndamental
1ch that all
ame class;
y a single
| shape of
g rate and
ssume that
a common
jects; any
ts in terms
sition and
structures
n efficient
a specific
data (pixel
the object-
based on a
the pixels
xels of an
n, spectral
rique must
eans of an
ect-feature
The path-
that pixels
other by a
where the
n the path
thesis and
the scene
International Archives of the Photogrammetry, Remo Sensing and Spatial Information Sciences, Vol XXXV, Part B3. Istanbul 2004
Figure 1. MSS image Object-Based scene representation
2. MODELLING AND DEFINITIONS
The scene (in this work this is assumed it to be part of the
Earth's surface) is the target of the remote sensing system,
which is under investigation and the interest is to extract
information about the scene's structure and content (Tso and
Mather, 2001). The desired information is assumed to be
contained in the spectral, spatial, and temporal variation of
electromagnetic energy coming from the scene which is
gathered by the sensors (Hapke, 1993). Typically a complex
scene is composed of relatively simple objects of different sizes
and shapes, each object of which contains only one class of
surface cover type. The scene is often described by classifying
the objects and recording their relative positions and orientation
in the scene in terms of tabulated results and/or a thematic-map.
[n a remote sensing system, primary features of a scene are
formed by multispectral observations, which are accomplished
by spatially and spectrally sampling the scene. A multispectral
sensor samples several spectral dimensions and one spatial
dimension from the scene at a given instant of time. The second
spatial dimension can be provided by the motion of the platform
which carries the scanner over the region of interest, generating
a raster scan; alternately, the raster can be provided by area
array detector. Thus, through the data acquisition system, the
scene may view in an image from taken at each of a number of
electromagnetic wavelengths. This image can be thought of as a
multi-layer matrix whose elements are called pixels (Tso and
Mather, 2001). One of the important characteristics of such data
is the special nature of the dependence of the feature at a lattice
point to that of its neighbours. The unconditional correlation
between two pixels in spatial proximity to one another is often
high, and such correlation usually decreases as the distance
between pixels increases.
One of the distinctive characteristics of the spatial dependence
in multispectral data is that the spectral separation between two
adjacent pixels is less than two non-adjacent pixels, because the
sampling interval tend to be generally smaller than the size of
an object; i.e., two pixels in spatial proximity to one another are
unconditionally correlated with the degree of correlation
decreasing as the distance between them increases. The results
of study on measurement of different order statistical spatial
dependency in image data, specially the measurement of first,
second and third order amplitude statistics along an image scan
line show considerable correlation between adjacent pixels.
Seyler concluded, from the measurement of the distribution of
the difference between adjacent pixels, that the probability that
two adjacent pixels have the same grey level is about 10 times
the probability that they differ by the maximum possible
amplitude difference. Kettig (Kettig and Landgrebe, 2001) by
measuring the spatial correlation of multispectral data showed
that the correlation between adjacent pixels is much less when
conditional upon being with an object, as compared to
unconditional correlation. High correlation among adjacent
pixels in the observation space represents redundancy in scene
data. When such redundancy occurs, reducing the size of the
observation space should be possible without loss of
information.
As previously stated the scene is assumed to consist of
relatively simple objects of different sizes and shapes (see
Figure 1). The resolution of the spatial representation depends
on both pixel size and the interval between samples, which are
usually equal. By under-sampling information is lost; however,
over-sampling will cause increased redundancy. Typically the
size and shape of objects in the scene vary randomly, Figure |,
and the sampling rate, and therefore the pixel size, is fixed; it is
inherent in image data that data-dimensionality (the number of
spatial-spectral observation for scene representation) increases
faster than its intrinsic-dimensionality (the size of the smallest
set which can represent the same scene, numerically, with no
loss of information). Because the spatial sampling interval is
usually comparable to the object size, it follows that each object
is represented by an array of similar pixels. Therefore, scene
segmentation into pixels is not an efficient approach for scene
representation; however, a scene can be segmented into objects,
and since the shape and size of objects match the scene
variation, scene representation by simple-objects is more
efficient.
Object detection refers to finding the natural groups among the
contiguous pixels. In other words, the data is sorted into objects
such that the *Unity Relation" holds among members of the
same object and not between members of different adjacent
objects. Object extraction and clustering are similar in the sense
that they both are methods of grouping data; however, spatial
considerations make clustering and object extraction different.
Because an object can be textured, the pixels within an object
might not form a compact cluster in the measurement
(observation) space. Also, because there can be several
instances of a particular class of entities in a single image,
nonadjacent objects might be nearly identical in observation
space. Another difference is that in object extraction, the
existence of a partition that completely separates objects is
guaranteed. However, in clustering, if we allow underlying
classes with overlapping density functions, the classes can
never be completely separated in the observation space. Object
extraction can be thought of as transforming the original image,
which is a pixel-description of a scene into an arrangement of
object-description.
An object-description is often better than a pixel-description,
for two basic reasons:
1- More information about the scene entity is available from a
collection of pixels associated with the object than from an
individual pixel associated with the scene. This fact has
been exploited by “object” classification algorithms that
make a classification decision for each group of image
points, for example by sequential classification (Tso and
Mather, 2001). The potential advantages of object
classification are especially great when class probability
821