International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B7. Istanbul 2004
Figure 3. Fast wavelet packet decomposition
2.4.2 Best Basis Feature Extraction: For a given
orthogonal wavelet function, one may generate a large family of
orthogonal bases that include different types of time-frequency
atoms. The basis family is always interpreted as a dictionary ©
that is a union of orthonormal bases in a signal space of finite
dimension N:
D= U8 (7)
AeA
Wavelet packet is an example of dictionary where the bases
share some common vectors. Each orthonormal basis in the
dictionary is a family of N wavelet functions: @* = {yy _
and offers a particular way of coding signals, preserving global
energy, and reconstructing exact features. For discrete signals
of size N, the number of wavelet packet bases is more than 2
(Mallat, 1999). In order to optimize the non-linear feature
extraction of a given hyperspectral signal x, one may adaptively
choose the “best” basis in the dictionary D depending on the
spectral structures. Then the features are selected from the M
largest wavelet coefficients calculated by this best basis. This
can be done by the “fast best basis algorithm" proposed by
Coifman and Wickerhauser (1992). This algorithm first expands
a given signal x into a family of orthonormal bases such as the
wavelet packets. Then a complete basis called a best basis
which minimizes a certain cost functional C(x,q^) is searched
among the binary tree with a bottom-up progression. The best
basis 4» at each subspace wy is determined by minimizing
/ ]
the cost function C:
a if CG,8^) s C( 47) COS A54). (8)
a = J ? * ? i - ;
VA U A i£CG 87)» Cosa) Cosa)
The cost function C should be defined by the Schur concave
sum and with the additive property for efficient computation.
The cost function used in this study is entropy of the energy
distribution of the hyperspectral curve x for each pixel:
> 2
à (x...)
N Ko) og, 2
[=
(9)
C(x,B)- 23 3
e
Because of the advantage of the tree structure of wavelet
packets, the fast dynamic programming algorithm finds the best
basis with O(N log, N) operations (Mallat, 1999).
2.5 Local Discriminant Basis Feature Extraction
Entropy used in the best-basis algorithm is an index that
measures the flatness of the energy distribution of a signal.
Minimizing entropy will lead to an efficient representation for
the signal. Therefore, the best-basis algorithm is good for signal
compression but may not be good for classification problems.
The Local Discriminant Bases (LDB) method was proposed by
Saito and Coifman (1994) to search for a best basis for
classification. In this method, the discriminating function D
between the nodes of the tree is calculated from a known
training data set. The discriminating function D can be a
certain distance function between different classes. Then a
complete orthonormal basis, called LDB, that can distinguish
signal features among different classes is selected form the
library tree. To make this algorithm fast, the discriminant
functional D needs to be additive. In this study, J-divergence is
used as the discriminant function. Once the discriminant
function D is specified, the goodness of each node in the
wavelet packet tree can be compared with the two children
nodes for a classification problem. According the discriminant
measure, we can determine whether we should keep the
children nodes or not. This manner is the same as the best basis
search algorithm. Because the discriminant measures are
calculated within the subspace of wavelet packets, we don't
need too much training samples to estimate the discriminant
measures.
3. MATCHING PURSUIT FEATURE EXTRACTION
Both the best basis algorithm and LDB method are based on the
wavelet packets which divide the frequency axis into intervals
of varying sizes. Thus a best wavelet packet basis can be
interpreted as a "best" frequency segmentation. If the signal
includes different types of high energy structures at different
times but in the same frequency interval, such as the case of
spectral mixture of hyperspectral data, the wavelet packet basis
could not well adapt to the signal. Furthermore, the set of
orthogonal bases in the wavelet packet is much smaller then the
set of non-orthogonal bases which can be used to improve the
approximation of complex signals. The pursuit algorithms
generalize the adaptive approximation by selecting the vectors
from redundant dictionaries of time-frequency atoms, with no
orthogonal constraints.
The Matching Pursuit (MP) introduced by Mallat and Zhung
(1993) uses a greedy strategy to find the best basis for signal
approximation. Vectors are selected from the dictionary one by
one in order to best match the signal structures. It is closely
related to projection pursuit algorithm developed by Friedman
and Stuetzle (1981) for statistical parameter estimation. In this
study, we attempt to use the matching pursuit algorithm to
extract the features for hyperspectral image classification. Let
Date be a redundant dictionary with P> N vectors,
where le | — |. A matching pursuit begins by projecting x on à
y
vector ge and computing the residue Rx:
2 yo
Xm (s. y + Rx (10)
886