International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B7. Istanbul 2004
hyperspectral image classification, effective features are those
which are most capable of preserving class separability.
The most commonly used method of feature extraction is
Principal Components Transformation (PCT) (Fukunaga, 1990;
Jain et al., 2000; Landgrebe, 2001). PCT is an orthogonal
transformation to produce a new sequence of uncorrelated
images called principal components. Only the first M
components are used as the features for the image
representation or classification. The transformation matrix of
PCT consists of a Karhunen-Loéve basis whose vectors are
ordered by the decreasing sequence of the eigenvalues of
covariance matrix of the total hyperspectral data set. This would
result in the best fit of the approximation which has the
minimum mean-square error (Mallat, 1999). However, it is
sensitive to noise and has to be performed with the whole data
set. In contrast to the PCT which takes the global covariance
matrix into account, Linear Discriminant Analysis, or called
Canonical Analysis (Richards, 1993), generates a transformed
set of feature axes, in which class separation is optimized (Lee
and Landgrebe, 1993; Jimenez and Landgrebe, 1995). This
approach called Discriminant Analysis Feature Extraction
(DAFE) uses the ratio of between-class covariance matrices to
within-class covariance matrices as a criterion function. À
transformation matrix is then determined to maximize the ratio,
that is, the separability of classes will be maximized after the
transformation. Although the discriminant analysis is an
effective and practical algorithm for deriving effective features
in many circumstances, there are several drawbacks for this
method. First, the approach delivers features only up to the
number of classes minus one. Second, when the mean values of
different classes are similar or the same, the extracted feature
vectors are not reliable. Furthermore, if a class has a mean
vector very different from the other classes, the between-class
covariance matrix will be biased toward this class and will
result in ineffective features (Tadjudin and Landgrebe, 1998).
Finally, in order to estimate the between-class and within-class
scatter matrices reliably, the number of training samples should
be large enough. However, this is often not a common
circumstance for hyperspectral images. Lee and Landgrebe
(1993) showed that useful features could be separated from
redundant features by decision boundaries. The algorithm is
called Decision Boundary Feature Extraction (DBFE) because
it takes full advantages of the characteristics of a classifier by
selecting features directly from its decision boundary. Since the
method depends on how well the training samples approximate
the decision boundaries, the number of training samples
required could be much more for high dimensional data because
it computes the class statistical parameters at full
dimensionality. For hyperspectral images, the number of
training samples is usually not enough to prevent singularity or
to yield a good covariance estimate. In addition, DBFE for
more than two classes is sub-optimal (Tadjudin and Landgrebe,
1998). The DBFE method is also computationally more
intensive than the other methods.
2. WAVELET-BASED FEATURE EXTRACTION
In the past two decades, wavelet transform (WT) has been
developed as a powerful analysis tool for signal processing, and
also has been successfully applied in applications such as image
processing, data compression and pattern recognition (Mallat,
1999). Due to the time-frequency localization properties,
discrete wavelet and wavelet packet transforms have proven to
be appropriate starting point for the classification of the
measured signals (Pittner and Kamarthi, 1999). The WT
decomposes a signal into a series of shifted and scaled versions
of the mother wavelet function. The local energy variation of a
hyperspectral signal in different spectral bands at each scale (or
frequency) can be detected automatically and provide useful
information for hyperspectral image classification. Several
feature extraction methods based on the WT have been
proposed for hyperspectral images (Hsu and Tseng, 2000; Hsu,
2003). The general process of the wavelet-based feature
extraction methods is illustrated in Figure 1. Firstly, wavelet or
wavelet packet transforms are implemented on the
hyperspectral images and a sequence of wavelet coefficients is
produced. Then, a simple feature selection procedure associated
with a criterion is used to select the effective features for
classification. The criterion of feature selection can be designed
for signal representation or classification. In the stage of feature
selection shown in Figure |, some training data may be needed
as samples to find the effective features for classification.
Unlike the existing feature extraction methods such as DAFE
and DBFE which need to estimate the statistic parameters at full
dimensionality, the wavelet-based feature extraction optimizes
the criterion in a lower dimensional space. Thus the problem of
limited training sample size can be avoided.
N-dimensional
Hyperspectral
Images
EF — ——HÓ————ÀM— Ó r= 1 [tr
Wavelet Transform
(CWT, DWT, or
Wavelet Packets)
Feature
rl Training — Feature Selection
|
|
Supervised
Classification
Training >
Figure 1. The general flow chart of wavelet-based feature
extraction.
2.1 Orthogonal Wavelet Decomposition
The orthogonal wavelet transform in terms of multi-resolution
analysis (MRA) can decompose a signal x into the low-
frequency components that represent the optimal approximation,
and the high-frequency components that represent the detailed
information (Mallat, 1989. The inner coefficients of x in a
wavelet orthogonal basis can be computed with a fast algorithm
that cascades discrete convolutions with Conjugate Mirror
Filters (CMF) h and g, and sub-samples the output. The
decomposition formulas are described as following):
a,4lp]7 S hin -2pla n] a;* h[2p] (1)
hmc
884
It