Full text: Proceedings, XXth congress (Part 7)

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B7. Istanbul 2004 
hyperspectral image classification, effective features are those 
which are most capable of preserving class separability. 
The most commonly used method of feature extraction is 
Principal Components Transformation (PCT) (Fukunaga, 1990; 
Jain et al., 2000; Landgrebe, 2001). PCT is an orthogonal 
transformation to produce a new sequence of uncorrelated 
images called principal components. Only the first M 
components are used as the features for the image 
representation or classification. The transformation matrix of 
PCT consists of a Karhunen-Loéve basis whose vectors are 
ordered by the decreasing sequence of the eigenvalues of 
covariance matrix of the total hyperspectral data set. This would 
result in the best fit of the approximation which has the 
minimum mean-square error (Mallat, 1999). However, it is 
sensitive to noise and has to be performed with the whole data 
set. In contrast to the PCT which takes the global covariance 
matrix into account, Linear Discriminant Analysis, or called 
Canonical Analysis (Richards, 1993), generates a transformed 
set of feature axes, in which class separation is optimized (Lee 
and Landgrebe, 1993; Jimenez and Landgrebe, 1995). This 
approach called Discriminant Analysis Feature Extraction 
(DAFE) uses the ratio of between-class covariance matrices to 
within-class covariance matrices as a criterion function. À 
transformation matrix is then determined to maximize the ratio, 
that is, the separability of classes will be maximized after the 
transformation. Although the discriminant analysis is an 
effective and practical algorithm for deriving effective features 
in many circumstances, there are several drawbacks for this 
method. First, the approach delivers features only up to the 
number of classes minus one. Second, when the mean values of 
different classes are similar or the same, the extracted feature 
vectors are not reliable. Furthermore, if a class has a mean 
vector very different from the other classes, the between-class 
covariance matrix will be biased toward this class and will 
result in ineffective features (Tadjudin and Landgrebe, 1998). 
Finally, in order to estimate the between-class and within-class 
scatter matrices reliably, the number of training samples should 
be large enough. However, this is often not a common 
circumstance for hyperspectral images. Lee and Landgrebe 
(1993) showed that useful features could be separated from 
redundant features by decision boundaries. The algorithm is 
called Decision Boundary Feature Extraction (DBFE) because 
it takes full advantages of the characteristics of a classifier by 
selecting features directly from its decision boundary. Since the 
method depends on how well the training samples approximate 
the decision boundaries, the number of training samples 
required could be much more for high dimensional data because 
it computes the class statistical parameters at full 
dimensionality. For hyperspectral images, the number of 
training samples is usually not enough to prevent singularity or 
to yield a good covariance estimate. In addition, DBFE for 
more than two classes is sub-optimal (Tadjudin and Landgrebe, 
1998). The DBFE method is also computationally more 
intensive than the other methods. 
In the past two decades, wavelet transform (WT) has been 
developed as a powerful analysis tool for signal processing, and 
also has been successfully applied in applications such as image 
processing, data compression and pattern recognition (Mallat, 
1999). Due to the time-frequency localization properties, 
discrete wavelet and wavelet packet transforms have proven to 
be appropriate starting point for the classification of the 
measured signals (Pittner and Kamarthi, 1999). The WT 
decomposes a signal into a series of shifted and scaled versions 
of the mother wavelet function. The local energy variation of a 
hyperspectral signal in different spectral bands at each scale (or 
frequency) can be detected automatically and provide useful 
information for hyperspectral image classification. Several 
feature extraction methods based on the WT have been 
proposed for hyperspectral images (Hsu and Tseng, 2000; Hsu, 
2003). The general process of the wavelet-based feature 
extraction methods is illustrated in Figure 1. Firstly, wavelet or 
wavelet packet transforms are implemented on the 
hyperspectral images and a sequence of wavelet coefficients is 
produced. Then, a simple feature selection procedure associated 
with a criterion is used to select the effective features for 
classification. The criterion of feature selection can be designed 
for signal representation or classification. In the stage of feature 
selection shown in Figure |, some training data may be needed 
as samples to find the effective features for classification. 
Unlike the existing feature extraction methods such as DAFE 
and DBFE which need to estimate the statistic parameters at full 
dimensionality, the wavelet-based feature extraction optimizes 
the criterion in a lower dimensional space. Thus the problem of 
limited training sample size can be avoided. 
EF — ——HÓ————ÀM— Ó r= 1 [tr 
Wavelet Transform 
(CWT, DWT, or 
Wavelet Packets) 
rl Training — Feature Selection 
Training > 
Figure 1. The general flow chart of wavelet-based feature 
2.1 Orthogonal Wavelet Decomposition 
The orthogonal wavelet transform in terms of multi-resolution 
analysis (MRA) can decompose a signal x into the low- 
frequency components that represent the optimal approximation, 
and the high-frequency components that represent the detailed 
information (Mallat, 1989. The inner coefficients of x in a 
wavelet orthogonal basis can be computed with a fast algorithm 
that cascades discrete convolutions with Conjugate Mirror 
Filters (CMF) h and g, and sub-samples the output. The 
decomposition formulas are described as following): 
a,4lp]7 S hin -2pla n] a;* h[2p] (1) 

