Th e International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Voi. XXXVII. Part B7. Beijing 2008
1033
(NDVI or EVI). In this tudy, the EVI (Enhanced Vegtation
Index) has been chosen due to its advantages compared to
NDVI. Indeed, it is less affected by atmosphere and soil
disturbances. It is also more sensitive than NDVI in areas of
high vegetetation activity (Huete et al., 1999), such as Mato
Grosso.
The EVI is defined as :
higher the distance, the further the pixel is from the class centre.
The Mahalanobis distance is defined by the equation:
2(NIR - R)
EVI = (1)
(L + NIR+ CÌ.R + C2.B)
where R, NIR and B correspond respectively to red, near infra
red and blue bands. L, Cl and C2 are adjusting parameters to
minimise aerosol effects (Huete et al., 1999).
The spatial resolution of these data (250 m) is particularly
adequate to analyze crops in Mato Grosso. Indeed, the mean
area of fields of 176ha allows using such a moderate resolution.
The temporal reslution of 16-days (23 images per year) is
composed through the Maximum Value Composite method
based on daily data (Huete et al., 1999). This treatment allows
deleting some noise due to cloud effects for instance. However,
in tropical regions such as Mato Grosso, cloud effects still
remain. A smoothing algorithm was then applied to improve the
quality of the EVI profiles. This algorithm is the Weighted
Least Squares smoothing algorithm proposed by Swets et al.
(1999).
The EVI MODIS data were then acquired, processed and
filtered for the referred years so as to build two annual temporal
sequences with 23 images each. Moreover, a principal
component analysis (PCA) was carried out for each year and
the 5 principal components were selected so as to attempt to
better capture the main variability factors present within each
class.
3. METHODOLOGY
To validate the ground truth data quality and optimize the
training sample to be used in the classification process, a
methodology that aimed to detect outliers in a multivariate data
set was applied. There are a large number of methods in the
literature for outlier detection from multivariate data, as
reviewed by Ben Gal (2005) and Penny and Jolliffe (2001).
Data mining techniques such as clustering are not considered in
this study. Indeed, when using clustering, the number of
outliers depends on the number of clusters wanted. Moreover,
clusters are defined to detect groups of homogeneous pixels,
whereas outliers can be represented by isolated pixels.
Thus, the chosen procedure consists in applying a multivariate
statistical analysis. The technique is geared towards computing
distances between each sample and the remaining pixels of its
class. So, it allows identifying which samples are more central
and commonplace, as opposed to the ones that present more
abnormal behaviour. In order to do that, robust measures of
each class’s center and covariance matrix are computed,
respectively by calculating the median vector of the sample
attributes and by computing the minimum covariance
determinant (MCD). From this point, Mahalanobis distances are
computed for each sample in relation to its class center. The
for i = 1, ...,« where n is the sample size, X n is the sample
mean vector and V n is the sample covariance matrix.
This Mahalanobis distance was applied to the collected data on
the field in year 2005-2006. The distances were calculated for
each class based on the 23 EVI MODIS and on the PCA
components.
A threshold is then estimated in order to separate acceptable
samples from those considered as outliers. Different thresholds
are tested from considering 0% to 20% of outliers to be present
in the data set.
The training sample without the outliers is then used to classify
the pre-defined classes. Different classifiers are tested in order
to to evaluate if the impact of outlier detection on classification
depends on the used algorithms. The tested classifiers are
Maximum Likelihood, Spectral Angle Mapper (Rembold and
Maselli, 2006) and Decision Tree C4.5 (Quinlan, 1996). The
classification training is based on year 2005-2006 and applied
on year 2006-2007 in order to know if the selected data can be
used to classify other years.
4. RESULTS
Results showed that low distance measures could be observed
for the majority of each class’s samples (fig. 2). It indicates that
there are few outliers in each class. Visual inspection of the
samples with larger distances confirmed that these MODIS
pixels generally corresponded to cases with abnormal
phenological responses. Variation coefficients analyses (fig. 3)
show that the variability in the detected outlier samples is
always higher than in the more confident samples. It thus
confirms that the detected outliers correspond to particular
pixels, which can potentially deteriorate classification quality.
Moreover, studying only those more confident pixels allows
representing profiles for each class that can be considered as
“correct” pixels or nearly “pure” pixels. Thus, figure 4 presents
the different MODIS EVI profiles obtained with samples
corresponding to lowest and highest Mahalanobis distances. It
appears that the outlier pixels do present different profiles that
can potentially affect the classification quality.
Three classifiers were tested with different training data. First,
the Mahalanobis are computed on EVI profiles or on PCA
components. Then, progressive thresholds are considered to
detect outliers (0% to 20% of outliers per sample).
Results are significantly different depending on the classifier
used (fig. 5). The Spectral Angle Mapper classifier was the most
robust one. It allows keeping good Kappa indices (Kappa > 0.8)
even if the training sample size is reduced. Outliers, either
detected based on the entire EVI profiles or on PCA
components, don’t deteriorate the classification quality. This is