ertainty
ertainty
alues of
type of
type of
> of this
es, and
tudy is
, Image
es. For
e stated
to the
used by
certain
has its
acy and
jility is
d in all
ertainty
ssity of
analysis
and the
ethods
to the
ith this
n of the
matical
nd the
lved in
a case
ite the
1. A
on the
m into
ved.
AINTY
Image
ison is
inexact or uncertain in some respect or other. As
analyzed by Frost (1986), this is due to several factors:
a) the universe of discourse is truly random; b) the
universe of discourse is not strictly random but for
some reason there is insufficient data; c) available
knowledge represents a 'gut feeling! and such
judgmental knowledge can be useful when more
sound knowledge is not available; d) available
knowledge is couched in terms which are themselves
vague (e.g. the word 'usually' in 'Canary grass usually
will not follow canola; and e) the knowledge source is
imperfect.
Among these factors, b) is a typical situation in
spectrally-based remote sensing image classification.
For example, suppose crop types are to be identified
only based on spectral information. A commonly used
supervised classification method is to compute the
likelihood that a field grows a type of crop using
probabilistic reasoning, based on the evidence obtained
from training areas.
Furthermore, ancillary information, such as soil types
and digital terrain models, may be used to improve
classification accuracy. This is achieved through the
representation of relationships between ancillary data
and crop types using certainty values such as
probabilities and certainty factors, and the
incorporation of these certainty values into the
probabilistic reasoning. These certainty values are
usually estimated from two sources, i.e. databases and
human experts. Databases are used as samples to
compute probability values, while the statements may
be expressed in different ways by experts. For example,
an expert may state: "Oats usually grow well on the
land with elevation between el and e2, soil types t1, t2,
and t3, and slope ranging from sl to s2". This
statement is an empirical rule. It is judgmental; a
vague term (usually) is included in the statement; and
maybe only part of the ancillary themes of concern are
addressed, hence being imperfect or incomplete. Thus,
in addition to the uncertainty situation b), situations c),
d), and e) may all be encountered in the classification
of remote sensing image based on multiple knowledge
Source reasoning.
The UIU Problem in Remote Sensing Image
Classification
A way to examine the UIU problem in remote sensing
image classification is to look into the sources where
related knowledge for the classification is generated.
These sources can be generalized into three types: one
is non-time-serial databases, such as spectral image
databases for sampling training areas; the second is
historical databases or time-serial databases which are
used in the elicitation of ancillary knowledge; the last
is human experts who provide expertise related to the
ancillary information of concern. Figure 1 outlines the
major sources that cause the UIU problem.
Beneath the probabilities generated from non-time-
serial databases, there exist at least two types of
uncertainty. One is database accuracy, which deals
939
KNOWLEDGE SOURCES
DATABASES HUMAN EXPERTS
’
TIME SERIAL NON-TIME SERIAL ÁN
DATA DATA i :
+ 2 * 7 I A .
4 X ^i
Fh x + : 4 “i
4508 NT Si 4 A P
, vs 1 a4‘ A ' S i i A ;
AR ' | .
TRIS Satins pti
-
-
35s , "ue Set
Figure 1 Major Sources Causing the UIU Problem
with data quality. The other is the sufficiency of the
sample size available in the databases for statistics
purpose. For example, a database with ten thousand
records indicating the relationship between soil types
and vegetation distributions may have over thousand
records addressing soil type A, but only a few records
addressing soil type B. Thus, the probabilities
representing the relationship between soil types and
vegetation distributions would be more reliable or
certain for soil type A than for soil type B.
For time-serial databases, there are even more
uncertainties existing in the probability values
generated from the databases. The uncertainties of
database accuracy and sample sufficiency also apply to
time-serial databases. In addition, two other factors
affect the probability values based on this type of
databases. One is the number of time periods (e.g. the
number of years), since statistics based on few time
periods may be seriously biased, especially for the
themes that are closely related to socio-economic
situations. The other is the standard deviation of an
event's occurrences during different time periods,
since a large standard deviation may suggest the effect
of some factors (e.g. socio-economic factors) that are
not of concern in the knowledge elicitation. This can
be depicted through an example, as shown in Figure 2.
The height of the bar represents the number of fields
that grew flax in that corresponding year. The large
difference of flax field occurrences between 1986 and
other years, which causes a large standard deviation,
suggests a possibility that the high occurrence of flax in
1986 results from social economic factors such as the
crop price. If statistics based on such a database aim to
generate crop rotation rules, the result would probably
be biased.
Year 1986 1987 1988 1989
Figure 2 Occurrences of Flax Fields in An
Experimental Area