domain, are of the correct data class and fall
within predefined ranges. Although these types of
tests can detect blunders, their main aim is
ensuring logical consistency of the data rather
than improving accuracy.
Metadata is ‘data about data’ (Medyckyj-Scott et
al., 1991). Coming under the umbrella of metadata
are general descriptions of the data, provenance
lineage reports, specifications of Source materials
and the data dictionary. Its purpose is to allow
potential users to assess the suitability of the
data for a specific task. Metadata is therefore an
integral component of data standards. The proposed
standard for Digital Cartographic Data Quality
(Moellering, 1988) requires that measures of
accuracy (positional and attribute), consistency,
completeness and the data's lineage be recorded.
Whilst such information is clearly desirable, the
true place for quality information is embedded with
the data it pertains to. Only in this way can data
be proactive about its uncertainty.
FURTHER PERSPECTIVES ON POSSIBLE SOLUTIONS
The above short review, though somewhat selective,
indicates that many issues are outstanding, the
most fundamental of which is a theoretical basis
from which to implement the means to record,
propagate and visualize uncertainty in GIS. Before
presenting a general model proposed to fulfill this
function, additional perspectives on the solution
to be sought are discussed.
As commented above, much of the concern has been
with testing accuracy and eradicating error. This
assumes there is 'truth' against which to measure,
that is the existence of a binary right or wrong.
Most spatial data does not fall into mutually
exclusive sets with exact boundaries. Soils and
other geomorphic data are typical. More research
could be fruitfully expended on how to preserve
natural variation and fuzzy boundaries in GIS
rather than the current practice of making all data
fit a crisp representation in a digital database.
Existing techniques of using confusion matrices are
inapplicable to aerial photographic interpretive
data such as flood susceptibility or slope
instability, as the chances of being able to sample
flooding events or landsliding may be short-lived
or rare and in any case hazardous. Using a panel of
experts to review all such interpretive data may
actually increase the uncertainty if a consensus
cannot be reached!
Recourse to check surveys of higher accuracy may
not be the answer either for this type of data. For
each inexact phenomenon there is a characteristic
resolution (linked to scales of space and time) at
which confidence in identification and delineation
is at a maximum (Davis et al., 1991). Trying to
survey at a different resolution will only increase
uncertainty. The object ‘village’ breaks down into
buildings and sub-landuses in one direction making
delineation more difficult and gradually reduces to
a dot in the other. In a similar way, increasing
the number of classes to account for variation may
not be helpful at the analytical stage.
Aggregate measures of quality produced by existing
testing schemes are not very meaningful as they say
nothing about variability of error in space. For
large coverages compiled possibly by several
individuals, such variability is likely to be an
important component in decision making from
derivative maps.
762
Who should hold responsibility for the data? A
commonly held view is that users are at the mercy
of someone else's data collection. Inadequate
metadata or measures of quality leave the user
uncertain as to the fitness-for-use of the data.
Even with quality measures present, there is no
saying how different users may interpret them. The
reverse side of the coin is that data collectors
have little control over misuse of their data once
it is in the GIS (Beard, 1989).
Clearly some quality data is required, but how
much? With complex measures of different dimensions
of the data, the database may become
disproportionately loaded with quality data. After
all, given the nature of most GIS data the level of
uncertainty in any uncertainty measures is likely
to remain high. Spurious accuracy in quality data
should be avoided and a more general approach
adopted. An informed user should not be committing
large resources from decisions based only on maps.
Such maps should provide information on likely
Sites or appropriate scenarios so that limited
funds for detailed studies can be deployed with
maximum benefit. Such a user requires guidelines as
to where potentially suitable sites carry a high
risk of abortive work and where potentially
unsuitable sites may be usefully explored. In this
situation the data must be proactive about its
uncertainty over every part of the map, must be
capable of propagation and can be interpreted
within the specific context.
A GENERAL MODEL FOR HANDLING UNCERTAINTY AND
FITNESS-FOR-USE IN GIS
A general model for handling uncertainty in GIS is
presented in Figure 2. This model is designed with
regard to the issues discussed in the previous
sections and will hopefully provide a focus and
coordinating principle for future research as
discussed in the final section.
The overall structure is based on a communication
model (Bedard, 1987). It recognizes that data
collection is carried out within a specific context
and yet observers will have their own view of the
real world (W). Observers will either generate
uncertainty measures (if appropriate for the data)
or at worst be able to verbalize their uncertainty
about a number of dimensions of the data. This
information about the uncertainty in the data may
be global or pertaining to individual objects or
entities. It is further recognized that a different
metric (or choice of metrics) may be used
internally in the GIS to propagate uncertainty
during data analysis. There must then be a mapping
from the observers stated uncertainty to the
propagation metric. This allows flexibility in
collecting uncertainty measures at the observation
stage, provided a suitable mapping into the
internal metric can be found, whilst remembering
that high levels of precision in such
transformation may be spurious.
Following propagation the resulting metric and its
spatial distribution may not be easily intelligible
to the user. A second mapping is therefore required
So that any visualization fits the user's real
world model in the context of the specific task.
The user can then assess fitness for use and take
responsibility for the data. Sensitivity analysis
is possible if the lineage of analysis is stored
and on the basis of the results, issue requirements
for improved data. This can relate to a specific
portion of a coverage if the distribution of
uncertainty is known.
TI
e
VE
re