Full text: XIXth congress (Part B7,3)

  
Petrie, Gregg 
  
The initial application of this methodology (for a worked example see Petrie, 1997) with hyperspectral data sets has 
produced good results. More importantly, the previous work has laid the groundwork for exploiting lower spectral 
resolution data sets (e.g., Landsat, IKONOS) for rangeland health assessment. This hyperspectral experimentation 
identified several factors that should be considered when attempting the more difficult problem of using multispectral 
images (e.g., severe constraints imposed on the number of endmembers possible). First, the methodology can be 
conceptually viewed as a numerical optimization search for the best subspace; with no guarantee that the best 
endmembers have been found. There is always the danger of finding a local minimal. Second, using the most extreme 
signature in an image as a candidate endmember could present problems if it is not representative (e.g., using a 
signature from a small anomalous man-made structure in an otherwise natural scene). 
4 OPTIMUM LANDSCAPE CLASSIFICATION 
The number of cover classes used in a classification analysis can be less then optimal. Selecting too many or too few 
classes can cause either the consideration of more classes than the data can support or can be the cause for ignoring 
useful information inherent in the data set. Moreover, determining the error derived from specifying the number of 
classes in the model incorrectly has been largely ignored. Therefore, selecting the correct number of classes for the 
rangeland health protocols developed for this project could be particularly important for applications with similar 
objectives. The approach consists of two steps, based on statistical analysis with few user-selected thresholds. 
Step 1: Determine the maximum number of classes based on hierarchichal clustering. 
Data in the form of reflectance from each band for a given pixel are sampled using a random sample of n = 200 pixels 
(user selected) to reduce spatial correlation (i.e., correlations between neighboring pixels) and to reduce the dimension 
of the matrix to be inverted later during discrimination of classes. A joining tree clustering algorithm using Euclidean 
distance as a measure of the difference between observations (i.e., the square root of the sum of squared differences 
between two observations for all variables measured) and a complete linkage rule for combining clusters (i.e., clusters 
are combined based on the Euclidean distance between the points farthest apart) was used to determine the range of 
clusters evaluated. The number of clusters achieved at 10 or 15% (user selected) of the maximum linkage distance such 
that the percent of classes with less than five observations remains less than 40% defines the set of classes to be 
considered. 
Step 2: Determine the optimum number of classes based on k-means clustering. 
The optimum number of classes to identify is determined by the resulting analysis of variance, class assignments, and 
the Euclidean distance from class centroids produced from a k-means clustering analysis for k = 2 to the maximum 
number of classes considered (from Step 1). The analysis of variance results (i.e., sum of squares between = SSB and 
sum of squares within = SSE) from each band are totaled to produce a weighted grand F-statistic. Weights are the 
proportion of the total sum of squares associated with a given band corrected for the mean. The weighted F-statistic is 
then equal to the weighted SSB divided by (k-1) (i.e., MSB) divided by the weighted SSE divided by (n-k) (i.e., MSE). 
The F-statistic will be at a maximum when the number of classes produces a maximum distance between classes (MSB) 
and a minimum distance between observations within a class (MSE). In order to maximize the number of classes 
identified, we have allowed the optimum number of classes to be greater than the number of classes that achieved the 
maximum F-statistic. However, to constrain the optimum number of classes, we have limited the number of classes that 
have less than 5 observations. Further, we have constrained the maximum Euclidean distance from an observation to its 
class centroid to less than the 75th percentile achieved from all observed distances assuming only two possible classes. 
Thus, our protocol for choosing the optimum number of classes is conducted by stepping from k = 2 to k = maximum 
number of classes, and calculating the F-statistic, until achieving the highest number of classes (k) that meets the 
following three criteria: 
1. The percent difference between the maximum F-statistic (i.e.,, between classes mean square divided by the within 
classes mean square, using the weighted total sum of squares, where the weights are the proportion of the total sum 
of squares associated with each band) and the achieved F-statistic is less than or equal to 20% (user selected); 
2. The percent of classes out of the total considered with less than five observations is less than or equal to 20% (user 
selected); and 
3. The maximum distance of any observation from its class centroid is less than the 75th percentile of all distances 
observed using only two classes. 
A test of this protocol was conducted using hyperspectral data (5-meter pixel resolution) and a Landsat image (30-meter 
pixel resolution) from the same region. We anticipated that the optimum number of classes should decrease as pixel 
size increases, although the difference in extents of the imagery added a confounding effect. For direct comparison of 
  
1146 International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B7. Amsterdam 2000.
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.