! 2004
pp.
DTM
test.
^ CDD.
acy of
ection
mated
rected
m on
1989.
using
ology
1978.
fodels
1978.
fodels
ations
metry,
SPOT
laska's
emote
Stereo
cience
ual of
FACTORS CAUSING UNCERTAINTIES IN SPATIAL DATA MINING
Hanning YUAN* Shuliang WANG®
*School of Remote Sensing Information Engineering, Wuhan University, Wuhan 430079, China
? International School of Software, Wuhan University, Wuhan 430079, China
E-mail: hnyuanslwang@yahoo.com,
Commission IV, WG IV/3
KEY WORDS: Factors, Uncertainties, Spatial data mining
ABSTRACT:
Spatial data mining is to extract the unknown knowledge from a large-amount of existing spatial data repositories areas (Ester et al.,
2000). The spatial data are to represent the spatial existence of an object in the infinitely complex world. They may be incomplete,
noisy, fuzzy, random, and practical because the computerized entities are different from what they are in the real spatiotemporal
space, i.e., observed data different from true data.
For it works with the spatial database as a surrogate for the real entities in the
spatial world, spatial data mining is unable to avoid the uncertainties. If the uncertainties are made appropriate use of, it may be able
to avoid the mistaken knowledge discovered from the mistaken spatial data. The uncertainty parameters, such as, supportable level,
confident level and interesting level, may further decrease the complexity of spatial data mining. Otherwise, it is unable to discover
suitable knowledge from spatial databases via taking the place of both certainties and uncertainties with only certainties. Based on
the unsuitable even mistaken knowledge, the spatial decision may be made incorrectly. The uncertainties mainly arise from the
complexity of the real world, the limitation of human recognition, the weakness of computerized machine, or the shortcomings of
techniques and methods. Their current constraints might further propagate even enlarge the uncertainty during the mining process.
1. OBJECTIVE REALITY
The world is an infinitely complex system that is large,
changeable, nonlinear, and multi-parameter, about 80%
information of which is spatial-referenced (Wang, 2002). In
the spatial world, there are more inexact entities with
indeterminacy or inhomogeneity than the exact ones. The
spatial entity in the world includes historical information,
current status, and future trend. At any moment, it receives the
information from other entity, and it also eradiates its own
information. The information of different entities may be
overlapped, mixed, or deformed. Two entities of the same
classification may eradiate different spectrum information,
while two entities that eradiate the same spectrum information
may belong to different classifications. As a result, it is
confused to correctly classify the pixels with the same gray
degrees in the boundary area where two different
classifications overlap. In the real world, the information
cannot be incarnated if it is not sensed by the observation of a
certain instrument. Remote sensing captures spatial data via
detecting the spectrum with sensors. Traditionally, it was
presumed that the spatial world stored in spatial database was
crisply defined, precisely described and accurately measured in
computerized databases (Burrough, Frank, 1996). For instance,
an object model assumes that the spatial entities may be
precisely described via points with exactly known coordinates,
lines linking a series of crisply known points, and areas
bounded by sharply defined lines. However, these cases
seldom happened in the real world, and in many cases, there do
not exist the pure points, lines, and polygons with geometric
definitions (Wang, Shi, 2002).
Some true spatial values are even inexact or inaccessible. The
true values of spatial data are the actual characteristics of the
spatial entity reality. Some true spatial values exist but are
impossible to obtain. One is unobservable for they are spatial
261
data with long history, the other is impractical to observe
because they are too complex, difficult or expensive for human
to get in the constraint contexts of current cognition,
instruments and techniques, times and capitals. As to some
spatial values, there are further no true values at all in the real
world. Some spatial entities have no sharp boundaries or
cannot be precisely determined. Take it for example that the
spectrum of the spatial entity makes the image data uncertain.
It is a fundamental function to determinate whether or not the
spatial element belongs to the predefined entity, and the
classification determination is performed on the accessible
spatial values that are measured by sensors. The overlapped or
mixed pixel of remote sensing images comprehensively reflects
the classifications of different but neighbor objects on the
ground. The additional but indispensable measurement step
will further cause uncertainty because of the limitations in the
process. Remote-sensing images of different objects may show
the phenomena of spectral uncertainty created by spatial
entities. One is that two objects belong to the same type or
species but with different spectrums, which cannot be uniform
as one spectral curve, but are composed of a series of different
spectral curves, and cause a wide distribution. In a generalized
category it also includes the multi-angular, multi-temporal and
multi-scale effect, e.g., Rocks/Minerals, Vegetation. The other
is that two objects belong to different classifications but with
the similar or same spectral features in a certain wavelength
range, e.g., the camouflage in military. New uncertainties may
further be caused during the process of additional but
indispensable measurements.
The uncertainty is more popular in macro-world (e.g., astro-
space) and micro-world (e.g., the space that electron, proton
are moving), both of which are moving at a high speed
(Duncan, 1994). The length of moving objects, and the
distance between two objects, all have contractility. The