ISPRS, Vol.34, Part 2W2, “Dynamic and Multi-Dimensional GIS”, Bangkok, May 23-25, 2001
400
unique combinations of environmental conditions and discern
areas of environmental transition. Unsupervised fuzzy
classification allows us to identify and map the spatial locations
of natural classes in environmental data. It can also be used to
depict the areas of environmental transition. Thus it can
facilitate the allocation of field investigation efforts and improve
the efficiency of field investigation.
2.2 The Method
Our method of assisting knowledge development consists of
the following four steps: (1) environmental database
development; (2) environmental niches identification; (3)
allocating field investigation efforts; (4) distilling relationships
between the geographic phenomenon and environmental
conditions.
2.2.1 Environmental database development. Environmental
factors related to the geographic phenomenon are first
identified. For example in soil mapping, the factors related to
soil formation need to be identified. A GIS database on these
environmental factors are then generated given that the source
data is available and GIS data layers can be created for each
of the environmental factors.
2.2.2 Identify environmental niches using a fuzzy c-means
classifier (FCM). FCM is a classifier which first optimally
partitions a dataset (such as the environmental dataset
described in 2.2.1) into a given set of classes and computes
the membership of each data element (such as the
environmental conditions at a pixel) in each of the classes
(Bezdek et al., 1984). It identifies the centroids of classes by
minimizing the fuzzy partition error as given in Equation 1
(Bezdek et al., 1984):
n n
Jm(U, v) = X X (^)'1bjfc“vJ| 2 A (1)
k=I /=1
where y is the data; c is the number of clusters in y; m is a
weighting exponent; u is a fuzzy c-partition of Y; v is a vector
of cluster centers; A is a weighting matrix; n is the number of
objects in set y; u ik is the membership of the kth object (x k )
belonging to the ith cluster. Jm, the fuzzy partition error, can
be described as a weighted measure of the squared distance
between pixels and class centroids, and so is a measure of the
total squared errors as minimized with respect to each cluster
(Ahn et al. 1999, Ross 1995). Jm decreases as the clustering
improves (meaning that pixels tend to be overall closer to their
representative centroids).
In most cases, one does not know the number of classes that
best describe the structure in the data set. To judge the
effectiveness of the clustering results generated using the
above fuzzy c-means algorithm, two cluster validity measures
(partition coefficient (F) and entropy (H)) are defined as
(Bezdek et al., 1984):
F c(“) = £ X (U lk f In (2)
k= 1 /=1
n c
#c(«) = -~Z X (n ik \oga(u lk ))/n (3)
k=1 /=1
Partition coefficient F will take the values of 1/c to 1, while
entropy H ranges from zero to log a(c) (Ahn et al 1999). F
measures the amount of overlap between clusters, and is
inversely proportional to the overall average overlap between
pairs of fuzzy sets (Ahn et al. 1999). H, conversely, is a scalar
measure of the amount of fuzziness in a given fuzzy partition U
(Bezdek 1981). The best fuzzy c-partition, e.g. the number of
classes that best describe the structure in the data set, is thus
the c-partition which realizes the highest F (u) and the
lowest H c {u) (Ward et al. 1992). Note that both H and F will
reach maxima and minima at the same points, and in this
sense they are essentially equivalent (Bezdek 1981).
It is often the case that F increases and H decreases as the
number of classes decreases. To determine if a fuzzy
clustering can be considered optimal, i.e. the number of
clusters optimally describes the structure in the dataset, one
should examine the improvement in entropy or partition
coefficient over adjacent clusterings (Zhu 1989). If there is a
significant improvement, one can consider the current
clustering is a better partition of the dataset.
2.2.3 Allocating field investigation efforts. Once the optimal
clustering of the environmental data set is determined,
membership maps for clusters can be produced. Spatial
locations of environmental clusters and areas of environmental
transition can be identified on these maps. For each
membership map, the locations of its cluster centroid are in
those areas with high membership values. Thus, field
investigation efforts should be mostly allocated to these areas.
2.2.4 Distilling relationships between the phenomenon and
environmental conditions. By investigating the status or
property of the given phenomenon at the locations of
environmental cluster, one can quickly establish the
relationships between the phenomenon and its environmental
conditions. Interpretation of membership distribution of the
clusters will allow one to develop an appreciation of how the
phenomenon varies over space in response to variation in
environmental conditions.
3. A Soil Mapping Case Study
Soil mapping is based on the classic concept that soil is the
product of the interaction of its formative factors. Thus, it (the
concept) assumes that there is a relationship between soil and
its formative environmental conditions. Soil mappers first
obtain (establish) this relationship through extensive field work
(survey) and then use this relationship with the observable
environmental conditions to map the spatial distribution of
soils.
Acquiring soil-environmental relationships (also referred to as
soil-landscape model) through the conventional means
(extensive field work) is very time-consuming, costly, and
subjective. The method described above was applied to
improve the efficiency of the field investigation.
3.1 Study Site
The study site is the Medina watershed, located in eastern
Dane County, Wisconsin, approximately 30.5 kilometers east
of Madison and 2.5 miles southeast of the town of Marshall.
The Medina watershed is about 6,617.6 acres (10.34 km 2 ) in
area. The total relief of the Medina watershed is 50.3 meters
(157 feet), and so is generally indicative of a gentle
environmental gradient. The area is made of drumiins and
inter-drumlin swales. Landuse in the Medina watershed at the
present day is generally limited to corn and alfalfa farming. The
soils in the area are formed on aeolian deposits of silt loam
loess, which are underlain by sandy loam glacial till parent
material.
The Medina watershed was chosen for this study is that the
local soil experts have limited experience in this area, and so
potentially stand to gain improvement to their knowledge
through the use of a fuzzy c-means classification strategy.
3.2 Applying the Method
3.2.1 Developing the GIS data layers on environmental
conditions. The following five environmental layers were
considered to be primary importance to soil formation in the
study area: elevation; slope percent; planform curvature;
profile curvature; and upstream drainage area index. This
decision was based on the literature, specifically McSweeney
et al. (1994), which contended that these five environmental
measures exerted the majority of influence on soil formation
and development at the watershed scale. As no specific a-