XVIIth ISPRS Congress: XVIIth ISPRS Congress

fritz, lawrence w.; lucas, james r.
ical 
such 
ween 
jical 
no 
be 
iould 
hese 
hese 
ter! 
n is 
“ined 
ends 
data 
| be 
y as 
r by 
e is 
arved 
d as 
)EMs, 
is of 
n on 
and 
jrid) 
"rain 
For attributes recorded on nominal scales or by 
discrete classes, the use of misclassification 
matrices is widely used. Such matrices are of 
particular importance in testing interpretive data 
from remote sensing. A number of indices can be 
derived to summarize the matrix such as percentage 
pixels correctly classified (PCC) and its variants. 
Debate has not only centered around the appropriate 
derivation of indices but also on whether these 
should reflect accuracy from the producer’s or 
user’s point of view (Story & Congalton, 1986). 
Testing is still reliant on points though chosen 
through a sampling scheme. Difficulties here arise 
because classification schemes rarely have the 
mutual exclusivity of crisp sets, boundaries are 
often avoided (as in soil sampling) and the 
position of the sampling point must be correctly 
located on the ground. Middelkoop (1990) puts 
forward an alternative approach whereby a confusion 
matrix, generated by having several experts carry 
out interpretation of the same test area, is used 
to study boundary uncertainty. 
During the course of data collection and input into 
a GIS, a number of data accuracy measures become 
available which could be included in the database 
for later use in assessing the validity of 
analyses. For example, if vectors are digitized 
from a base map then the expected accuracy of the 
map may be known (e.g. 0.5m for planimetric 
detail), the error in control points for X and Y 
axes after map registration on the digitizer should 
also be known (e.g. ox-0.19mm, oy=0.14mm at map 
scale) and then tests of human performance using 
digitizer pucks would indicate accuracies of 
+0.25mm at map scale or half this if a magnifier is 
used (Rosenberg & Martin, 1988). For attributes, 
accuracy measures (Pccs or Rmses depending on data 
class) may result from fieldwork. The author is 
unaware of any commercial GIS software that 
automatically records such data and attributes them 
to entities, even when generated internally by the 
GIS software (as in map registration or rubber 
sheeting). Much of what could be used gets left 
behind along the way. 
Modelling 
Modelling in the broadest sense would have to 
include the choice and nature of metric, statistic 
or range of verbalizations used to describe error 
or other uncertainty prior to measurement. More 
narrowly, this section will consider some currently 
proposed strategies for handling uncertainty in 
data transformations within a GIS. Consideration 
could be given to a very wide range of data 
transformations (Tobler, 1990). Assuming, from the 
above section on measurement, something is known 
about the accuracy of ones data (location and 
attribute), what is the accuracy of a derivative 
map compiled by combining data as in overlay 
analysis? 
Map overlay will combine the  locational and 
attribute errors of two or more layers. For vector 
data, locational errors will result in the spurious 
polygon or sliver problem. A number of algorithms 
have been developed and implemented in some GIS 
software to remove spurious  polygons in an 
equitable way. These employ models based on the 
epsilon band concept (Blakemore, 1983; Chrisman, 
1983; Pullar, 1991), maximum perpendicular 
deviation (Peucker, 1976) or fuzzy tolerances 
(Zhang & Tulip, 1990). Slivers are considered 
undesirable and whilst their removal reduces both 
database size and processing time and enhances the 
aesthetic quality of the cartographic product, they 
are themselves (or their absence) an indication of 
761 
quality and their automated removal at each 
Successive stage of a complex analysis would 
introduce its own uncertainty. 
Propagation of attribute error is of greater 
concern. Much of the work in modelling such errors 
has been carried out using raster data. Newcomer 
and Szajgin (1984) use conditional probability for 
overlay assuming a Boolean AND operator. In such 
cases, the highest accuracy expected is equal to 
the accuracy of the least accurate layer used. 
Usually though, accuracy will continue to decrease 
as more layers are added. Tests by Walsh et al. 
(1987) seemed to confirm the earlier pessimism that 
"it is quite possible that map overlays by their 
very nature are so inaccurate as to be useless and 
perhaps misleading for planning" (MacDougall, 
1975). However, Veregin (1989) demonstrates that a 
Boolean OR operation for conditional probabilities 
will result in an accuracy not less than the most 
accurate layer used. Thus in a complex analysis 
using a combination of Boolean operators, composite 
map accuracy at each stage may improve or worsen 
significantly and hence an ability to track this is 
desirable. Recording of lineage in GIS operations 
(Lanter, 1990) seeks to address this requirement. A 
diagramatic example of the effects of data 
reselection, union and intersection using PCC 
values is provided by Lanter and Veregin (1991). 
Alternative approaches have been explored. 
Evidential reasoning (Shafer, 1976) has been used 
by Lee et al. (1987) and Garvey (1987) to combine 
multisource data. Belief functions are assigned to 
the data which by evidential computation and 
decision rules result in a measure of the 
plausibility or support for a particular 
proposition. Leung (1988) and Wang et al. (1990) 
have used fuzzy membership functions to assign 
climatic regions and land suitability classes 
respectively to conventional datasets. Heuvelink et 
al. (1989), using mean attribute values for each 
cell derived from kriging, were able to assess the 
reliability of their derivative maps by modelling 
the error propagation as a second-order Taylor 
expansion. 
General, workable solutions have not been 
demonstrated in the literature . The only study to 
provide visualization of reliability of composite 
maps as a continuous surface (rather than global 
measures) is Heuvelink et al. (1989). Their initial 
accuracy measures, however, are a product of the 
kriging process and therefore can only be 
implemented where interpolation of quantitative 
point samples by this technique is appropriate. 
Management 
If data quality is an important concern to both GIS 
implementors and users, then management strategies 
are required for controlling or reducing 
uncertainty and for ensuring fitness for use of 
products. Without a general model for handling 
uncertainty such strategies may be difficult to 
develop resulting in a series of loosely organized 
actions that may not achieve the desired goals. 
Current developments are concerned with consistency 
checks, metadata and standards. 
Logical consistency checks can be carried out both 
for entities and attributes (Laurini & Thompson, 
1992). Topology is most frequently used to check 
the integrity of vector polygons, tesselations and 
DEMs. Additional techniques used on DEMs are 
spatial autocorrelation (Caruso, 1987) and outlier 
detection using B-splines (Bethel & Mikhail, 1983). 
Attributes can be assessed for consistency with the
1
2
...
770
771
772
773
774
...
1000
1001
Full text: XVIIth ISPRS Congress (Part B3)

Access restriction

Copyright

Note to user