ical
such
ween
jical
no
be
iould
hese
hese
ter!
n is
“ined
ends
data
| be
y as
r by
e is
arved
d as
)EMs,
is of
n on
and
jrid)
"rain
For attributes recorded on nominal scales or by
discrete classes, the use of misclassification
matrices is widely used. Such matrices are of
particular importance in testing interpretive data
from remote sensing. A number of indices can be
derived to summarize the matrix such as percentage
pixels correctly classified (PCC) and its variants.
Debate has not only centered around the appropriate
derivation of indices but also on whether these
should reflect accuracy from the producer’s or
user’s point of view (Story & Congalton, 1986).
Testing is still reliant on points though chosen
through a sampling scheme. Difficulties here arise
because classification schemes rarely have the
mutual exclusivity of crisp sets, boundaries are
often avoided (as in soil sampling) and the
position of the sampling point must be correctly
located on the ground. Middelkoop (1990) puts
forward an alternative approach whereby a confusion
matrix, generated by having several experts carry
out interpretation of the same test area, is used
to study boundary uncertainty.
During the course of data collection and input into
a GIS, a number of data accuracy measures become
available which could be included in the database
for later use in assessing the validity of
analyses. For example, if vectors are digitized
from a base map then the expected accuracy of the
map may be known (e.g. 0.5m for planimetric
detail), the error in control points for X and Y
axes after map registration on the digitizer should
also be known (e.g. ox-0.19mm, oy=0.14mm at map
scale) and then tests of human performance using
digitizer pucks would indicate accuracies of
+0.25mm at map scale or half this if a magnifier is
used (Rosenberg & Martin, 1988). For attributes,
accuracy measures (Pccs or Rmses depending on data
class) may result from fieldwork. The author is
unaware of any commercial GIS software that
automatically records such data and attributes them
to entities, even when generated internally by the
GIS software (as in map registration or rubber
sheeting). Much of what could be used gets left
behind along the way.
Modelling
Modelling in the broadest sense would have to
include the choice and nature of metric, statistic
or range of verbalizations used to describe error
or other uncertainty prior to measurement. More
narrowly, this section will consider some currently
proposed strategies for handling uncertainty in
data transformations within a GIS. Consideration
could be given to a very wide range of data
transformations (Tobler, 1990). Assuming, from the
above section on measurement, something is known
about the accuracy of ones data (location and
attribute), what is the accuracy of a derivative
map compiled by combining data as in overlay
analysis?
Map overlay will combine the locational and
attribute errors of two or more layers. For vector
data, locational errors will result in the spurious
polygon or sliver problem. A number of algorithms
have been developed and implemented in some GIS
software to remove spurious polygons in an
equitable way. These employ models based on the
epsilon band concept (Blakemore, 1983; Chrisman,
1983; Pullar, 1991), maximum perpendicular
deviation (Peucker, 1976) or fuzzy tolerances
(Zhang & Tulip, 1990). Slivers are considered
undesirable and whilst their removal reduces both
database size and processing time and enhances the
aesthetic quality of the cartographic product, they
are themselves (or their absence) an indication of
761
quality and their automated removal at each
Successive stage of a complex analysis would
introduce its own uncertainty.
Propagation of attribute error is of greater
concern. Much of the work in modelling such errors
has been carried out using raster data. Newcomer
and Szajgin (1984) use conditional probability for
overlay assuming a Boolean AND operator. In such
cases, the highest accuracy expected is equal to
the accuracy of the least accurate layer used.
Usually though, accuracy will continue to decrease
as more layers are added. Tests by Walsh et al.
(1987) seemed to confirm the earlier pessimism that
"it is quite possible that map overlays by their
very nature are so inaccurate as to be useless and
perhaps misleading for planning" (MacDougall,
1975). However, Veregin (1989) demonstrates that a
Boolean OR operation for conditional probabilities
will result in an accuracy not less than the most
accurate layer used. Thus in a complex analysis
using a combination of Boolean operators, composite
map accuracy at each stage may improve or worsen
significantly and hence an ability to track this is
desirable. Recording of lineage in GIS operations
(Lanter, 1990) seeks to address this requirement. A
diagramatic example of the effects of data
reselection, union and intersection using PCC
values is provided by Lanter and Veregin (1991).
Alternative approaches have been explored.
Evidential reasoning (Shafer, 1976) has been used
by Lee et al. (1987) and Garvey (1987) to combine
multisource data. Belief functions are assigned to
the data which by evidential computation and
decision rules result in a measure of the
plausibility or support for a particular
proposition. Leung (1988) and Wang et al. (1990)
have used fuzzy membership functions to assign
climatic regions and land suitability classes
respectively to conventional datasets. Heuvelink et
al. (1989), using mean attribute values for each
cell derived from kriging, were able to assess the
reliability of their derivative maps by modelling
the error propagation as a second-order Taylor
expansion.
General, workable solutions have not been
demonstrated in the literature . The only study to
provide visualization of reliability of composite
maps as a continuous surface (rather than global
measures) is Heuvelink et al. (1989). Their initial
accuracy measures, however, are a product of the
kriging process and therefore can only be
implemented where interpolation of quantitative
point samples by this technique is appropriate.
Management
If data quality is an important concern to both GIS
implementors and users, then management strategies
are required for controlling or reducing
uncertainty and for ensuring fitness for use of
products. Without a general model for handling
uncertainty such strategies may be difficult to
develop resulting in a series of loosely organized
actions that may not achieve the desired goals.
Current developments are concerned with consistency
checks, metadata and standards.
Logical consistency checks can be carried out both
for entities and attributes (Laurini & Thompson,
1992). Topology is most frequently used to check
the integrity of vector polygons, tesselations and
DEMs. Additional techniques used on DEMs are
spatial autocorrelation (Caruso, 1987) and outlier
detection using B-splines (Bethel & Mikhail, 1983).
Attributes can be assessed for consistency with the