The 3rd ISPRS Workshop on Dynamic and Multi-Dimensional GIS & the 10th Annual Conference of CPGIS on Geoinformatics

Chen, Jun
ISPRS, Vol.34, Part 2W2, “Dynamic and Multi-Dimensional GIS”, Bangkok, May 23-25, 2001 
400 
unique combinations of environmental conditions and discern 
areas of environmental transition. Unsupervised fuzzy 
classification allows us to identify and map the spatial locations 
of natural classes in environmental data. It can also be used to 
depict the areas of environmental transition. Thus it can 
facilitate the allocation of field investigation efforts and improve 
the efficiency of field investigation. 
2.2 The Method 
Our method of assisting knowledge development consists of 
the following four steps: (1) environmental database 
development; (2) environmental niches identification; (3) 
allocating field investigation efforts; (4) distilling relationships 
between the geographic phenomenon and environmental 
conditions. 
2.2.1 Environmental database development. Environmental 
factors related to the geographic phenomenon are first 
identified. For example in soil mapping, the factors related to 
soil formation need to be identified. A GIS database on these 
environmental factors are then generated given that the source 
data is available and GIS data layers can be created for each 
of the environmental factors. 
2.2.2 Identify environmental niches using a fuzzy c-means 
classifier (FCM). FCM is a classifier which first optimally 
partitions a dataset (such as the environmental dataset 
described in 2.2.1) into a given set of classes and computes 
the membership of each data element (such as the 
environmental conditions at a pixel) in each of the classes 
(Bezdek et al., 1984). It identifies the centroids of classes by 
minimizing the fuzzy partition error as given in Equation 1 
(Bezdek et al., 1984): 
n n 
Jm(U, v) = X X (^)'1bjfc“vJ| 2 A (1) 
k=I /=1 
where y is the data; c is the number of clusters in y; m is a 
weighting exponent; u is a fuzzy c-partition of Y; v is a vector 
of cluster centers; A is a weighting matrix; n is the number of 
objects in set y; u ik is the membership of the kth object (x k ) 
belonging to the ith cluster. Jm, the fuzzy partition error, can 
be described as a weighted measure of the squared distance 
between pixels and class centroids, and so is a measure of the 
total squared errors as minimized with respect to each cluster 
(Ahn et al. 1999, Ross 1995). Jm decreases as the clustering 
improves (meaning that pixels tend to be overall closer to their 
representative centroids). 
In most cases, one does not know the number of classes that 
best describe the structure in the data set. To judge the 
effectiveness of the clustering results generated using the 
above fuzzy c-means algorithm, two cluster validity measures 
(partition coefficient (F) and entropy (H)) are defined as 
(Bezdek et al., 1984): 
F c(“) = £ X (U lk f In (2) 
k= 1 /=1 
n c 
#c(«) = -~Z X (n ik \oga(u lk ))/n (3) 
k=1 /=1 
Partition coefficient F will take the values of 1/c to 1, while 
entropy H ranges from zero to log a(c) (Ahn et al 1999). F 
measures the amount of overlap between clusters, and is 
inversely proportional to the overall average overlap between 
pairs of fuzzy sets (Ahn et al. 1999). H, conversely, is a scalar 
measure of the amount of fuzziness in a given fuzzy partition U 
(Bezdek 1981). The best fuzzy c-partition, e.g. the number of 
classes that best describe the structure in the data set, is thus 
the c-partition which realizes the highest F (u) and the 
lowest H c {u) (Ward et al. 1992). Note that both H and F will 
reach maxima and minima at the same points, and in this 
sense they are essentially equivalent (Bezdek 1981). 
It is often the case that F increases and H decreases as the 
number of classes decreases. To determine if a fuzzy 
clustering can be considered optimal, i.e. the number of 
clusters optimally describes the structure in the dataset, one 
should examine the improvement in entropy or partition 
coefficient over adjacent clusterings (Zhu 1989). If there is a 
significant improvement, one can consider the current 
clustering is a better partition of the dataset. 
2.2.3 Allocating field investigation efforts. Once the optimal 
clustering of the environmental data set is determined, 
membership maps for clusters can be produced. Spatial 
locations of environmental clusters and areas of environmental 
transition can be identified on these maps. For each 
membership map, the locations of its cluster centroid are in 
those areas with high membership values. Thus, field 
investigation efforts should be mostly allocated to these areas. 
2.2.4 Distilling relationships between the phenomenon and 
environmental conditions. By investigating the status or 
property of the given phenomenon at the locations of 
environmental cluster, one can quickly establish the 
relationships between the phenomenon and its environmental 
conditions. Interpretation of membership distribution of the 
clusters will allow one to develop an appreciation of how the 
phenomenon varies over space in response to variation in 
environmental conditions. 
3. A Soil Mapping Case Study 
Soil mapping is based on the classic concept that soil is the 
product of the interaction of its formative factors. Thus, it (the 
concept) assumes that there is a relationship between soil and 
its formative environmental conditions. Soil mappers first 
obtain (establish) this relationship through extensive field work 
(survey) and then use this relationship with the observable 
environmental conditions to map the spatial distribution of 
soils. 
Acquiring soil-environmental relationships (also referred to as 
soil-landscape model) through the conventional means 
(extensive field work) is very time-consuming, costly, and 
subjective. The method described above was applied to 
improve the efficiency of the field investigation. 
3.1 Study Site 
The study site is the Medina watershed, located in eastern 
Dane County, Wisconsin, approximately 30.5 kilometers east 
of Madison and 2.5 miles southeast of the town of Marshall. 
The Medina watershed is about 6,617.6 acres (10.34 km 2 ) in 
area. The total relief of the Medina watershed is 50.3 meters 
(157 feet), and so is generally indicative of a gentle 
environmental gradient. The area is made of drumiins and 
inter-drumlin swales. Landuse in the Medina watershed at the 
present day is generally limited to corn and alfalfa farming. The 
soils in the area are formed on aeolian deposits of silt loam 
loess, which are underlain by sandy loam glacial till parent 
material. 
The Medina watershed was chosen for this study is that the 
local soil experts have limited experience in this area, and so 
potentially stand to gain improvement to their knowledge 
through the use of a fuzzy c-means classification strategy. 
3.2 Applying the Method 
3.2.1 Developing the GIS data layers on environmental 
conditions. The following five environmental layers were 
considered to be primary importance to soil formation in the 
study area: elevation; slope percent; planform curvature; 
profile curvature; and upstream drainage area index. This 
decision was based on the literature, specifically McSweeney 
et al. (1994), which contended that these five environmental 
measures exerted the majority of influence on soil formation 
and development at the watershed scale. As no specific a-
1
2
...
411
412
413
414
415
...
448
449
Full text: The 3rd ISPRS Workshop on Dynamic and Multi-Dimensional GIS & the 10th Annual Conference of CPGIS on Geoinformatics

Access restriction

Copyright

Note to user