ods to
ls, are
y some
ls are
1ethod
| made
rences
to two
eature
ion in
which
atures
ion. In
ind we
et of Y
In the
ility of
ossible
eature
lection
ng the
sional
indant
used
ind of
ations
LS
(1)
is how
mation
| space
tion is
rithm.
sed on
bilistic
1e idea
re the
und by
evaluating the value of criterion function for all possible
combinations, but this is usually too time consuming.
For example, if the dimension of original feature space
is 20 and the dimension of transformed feature space is
10, 184756 different cases should be evaluated. The only
optimal search algorithm, which implicitly evaluates all
possible subsets is called branch and bound-algorithm.
Other, nonoptimal algorithms are for example sequential
forward selection and sequential backward selection.
The problem in feature extraction is to choose good
transformation matrix A. Usually transformation is
limited to linear form. Matrix A can be determined by
using same criterion functions as in feature selection. In
this case optimal matrix A is chosen so that criterion
function J(Ay) is maximized. Usually maximization is
only possible by using numerical optimization and it is
quite time consuming. Another alternative in feature
extraction is use Karhunen-Lôwe transformation which
is discussed in next chapter (Devivjer, 1982).
3. KARHUNEN-LÓWE TRANSFORMATION
Karhunen-Lówe transformation is based on discrete
Karhunen-Lówe expansion. In this transformation
original information is preserved as well as possible by
approximating original feature vector using several
terms of expansion.
3.1 Karhunen-Lówe expansion
We have d-dimensional random vectors y and we can
represent these vectors without error by the summation
of d linearly independent vectors as
d
yes NS x0; (2)
i=1
where x; are the coefficients of the basis vectors ¢ , Basis
vectors are orthonormal:
Tnm lord (3)
ol E for izj.
In this case, the components of vector x can be computed
by
X; y, i71... (4)
So, x is orthonormal transformation of y. If we want to
decrease the dimensionality of the feature space (d>m)
we simply do not use all basis vectors à, but select best
basis vectors. The best basis vectors minimize mean
Squared error between original vector y and its
approximation y. The mean squared error can be written
as
375
d
£7 Y e, 5)
i=m+1
We notice that discarded basis vectors affect to error.
Matrix X is autocorrelation matrix of y, also covariance
matrix can be used. Optimum choice for basis vectors is
those which satisfy
x6; - À; 0 (6)
or eigenvectors of X. Combining equations (5) and (6)
mean squared error becomes
we Foun e
i=m+1
Mean squared error is minimized when discarded
eigenvectors correspond to smallest eigenvalues
(Devivjer, 1982)(Fukunaga, 1990).
3.2 Summary of Karhunen-Lówe transformation
Presented results can be put on algorithmic form:
1. Compute the correlation or covariance matrix X of y.
2. Compute the eigenvalues and corresponding
eigenvectors of X. Normalize the eigenvectors.
3. Form the transformation matrix A from the m
eigenvectors corresponding to the largest eigenvalues
of X.
4. Compute transformed feature vectors using equation
(1) (Tou, 1974).
4. SELF-ORGANIZING NEURAL NETWORKS
An artificial neural network (referred as neural network
after this) is a parallel, distributed signal or information
processing system, consisting of simple processing
elements, also called nodes. Processing elements can
possess a local memory and carry out localized
information processing operations. In the simplest case
processing element sums weighted inputs and passes the
result through nonlinear transfer function. Processing
elements are connected via unidirectional signal
channels called connections. The connections are usually
weighted and those weights are adapted during training
of the network. Learning of the network is based on the
adaptation of the weights.
The neural network models can be characterized using
their properties like connection topologies, processing
element capabilities, learning algorithms, problem
solving capabilities etc., and models can differ greatly.
Neural networks try to imitate a biological nervous
system and its properties like memory and learning.
(Lippmann, 1987).
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B2. Vienna 1996