! 2004
n the
und.
patial
y the
n the
ld are
erized
leling,
;patial
their
2002).
es are
tional
andle
nistic,
jc, a
in set
r than
asible
sumed
which
d data
ei real
y and
lation,
val [0,
1e real
ion of
ble of
xisting
pe of
g., the
mmon
even
misuse
1s and
ry and
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B4. Istanbul 2004
mathematical statistics (Arthurs, 1965). But it is one important
role of reasoning under uncertainty to assist in decision-making.
Fourth, a prerequisite to analyze spatial uncertainty is the
availability of prior information about the uncertainty in data
sources and how the uncertainty affects the outcome of GIS
manipulations. This information may be known either exactly
as a range with upper and lower bounds around some mean
value; stochastically possessing a probability distribution
function; or possibilistically belonging to a fuzzy set. However,
factual prior information on the uncertainty is scarce, some are
difficult, expensive, or even impossible to obtain.
4. COMPUTERIZED MACHINE
Spatial data in the computerized machine uncertainly reflects
about the real world via binary digits in the form of zeros and
ones when they are used to describe, acquire, store, manipulate
and analyze spatial entities in the context of human needs
(Goodchild, 1995). Some of the uncertainty may come from
the computerized machine, e.g., physical modeling, logical
modeling, data encoding, data manipulation, data analysis,
algorithms optimization, computerized machine precision,
output. And it is a discrepancy between the encoded and actual
value of a particular spatial.
Any imaginable measuring device records its measurement
only with a finite precision, even if the device is designed and
used perfectly. Given the precision of a measuring device, the
outcome may be lack of the infinite accuracy in the output
instruments, e.g., monitor, printer. In order to record a
measurement with infinite precision, the instrument would
require an output capable of displaying an infinite number of
digits. By using more accurate measuring devices, uncertainty
in measurements can often be made as small as needed for a
particular purpose, and the accuracy will become greater and
greater. However, it only approaches but never reaches an
absolute accuracy. Thus there is no real measurement with
infinitely precision, instead of a value with a degree of
uncertainty. During the process of machine-based computing
and analysis, e.g., GIS buffering, layer overlapping and data
mining, these uncertainties are accumulated and propagated.
And the computerized machine may further produce new
uncertainties.
5. AMALGAMATING HETEROGENEITY
The spatial uncertainty becomes even more complex when
merging different kinds of spatial data, often from different
sources and of different reliabilities (Hunter, 1996). Moreover,
there often exist more than one uncertainty at the same time
during the process of uncertainty-based spatial data mining.
For example, both randomness and fuzziness are often included
in spatial entities. In order to create a best possible database,
spatial data users would like to see the matching and
amalgamation of heterogenous data, i.e., some kind of
average, or combination of elements from more than one
Source. But a common spatial database may conventionally
support an exact local application without considering the
global application. If these various local databases are
integrated together in the global context, the conflicts among
various spatial databases may also cause unpredicted
uncertainties, e.g., inconsistency across multiple databases.
Thus besides the abovementioned uncertainties from the real
world, human recognition, computerized machine or
techniques, some new uncertainties may further appear in
N
Go
spatial data if they are acquired from different sources
with heterogenous representations.
In a word, the uncertainty is unavoidable in spatial data sets,
and it can never be eliminated completely, even as a theoretical
idea. During the process of spatial data mining, spatial
uncertainty can propagate even become bigger when several
spatial uncertainties are accumulated. The limitations of
human recognition, mathematical model and technology may
further enlarge the uncertainty, which more easily leads to
mistaken decision making. Moreover, the increasing of the
amount of spatial data may not result in the decreasing of the
spatial uncertainty.
6. CONCLUSIONS
This paper presented the factors causing uncertainties in spatial
data mining. They might include the complexity of the real
world, the limitation of human recognition, the weakness of
computerized machine, or the shortcomings of techniques and
methods. In fact, the rational uncertainties (e.g., the
uncertainties in natural language) may save people out of the
data sea, and only the necessary data are allowed to enter
decision-making thinking, then to sublime knowledge.
Therefore, uncertainty-based spatial data mining is a potential
research project.
ACKNOWLEDGEMENTS
The work described in this paper was supported by the funds
from This study is supported by the funds from National
Natural Science Foundation of China (70231010), Wuhan
University (216-276081), and National High Technology R&D
Program (863) (2003A A 132080)..
REFERENCES
ARTHURS A. M., 1965, Probability theory (London: Dover
Publications)
BURROUGH P.A., FRANK A.U.(eds), 1996, Geographic
Objects with Indeterminate Boundaries (Basingstoke: Taylor
and Francis)
DUNCAN, T, 1994, Advanced Physics [4th edition](London:
John Murray)
ESTER M. et al, 2000, Spatial data mining: databases
primitives, algorithms and efficient DBMS support. Data
Mining and Knowledge Discovery, 4, 193-216
GOODCHILD M.F., 1995, Attribute accuracy. In Elements of
Spatial Data Quality, edited by GUPTILL S.C. and
MORRISON J.L (New York: Elsevier Scientific), pp.139-151
HUNTER A, 1996, Uncertainty in
(London: The McGraw-Hill Companies)
Information Systems
MILLER H. J., HAN J., 2001, Geographic Data Mining and
Knowledge Discovery (London and New York: Taylor and
Francis)