cal optimization of
dimensional linear
. Projection pursuit
pjection from high
mizing a function
ize the separability
sional space. The
either in Euclidean,
ranner. Originally
ged such that the
cted samples
(3)
projection index I
ng are
yrocessor [7]
|
n.[9] Application of
eviously is not
ds. Two approaches
ality: one approach
yeable small feature
ther approach is to
tion of decision tree
atures are not very
1ensionality manual
cult. The alternative
domain knowledge
Modular learning is one such approach in which number of
classifiers addresses specific issue of the problem are learnt
instead of a single classifier. The hierarchical multiclassifier
system proposed by [5] is based on the concept of modular
learning.
“Set v of C classes are divided into two sub classes referred as
meta classes. Linear feature extractor the best discriminates the
two meta classes and the class division itself is learnt
automatically. Meta classes are sub divided recursively until
the resulting meta classes have only one of the C original
classes. The resultant binary tree will have C leaf nodes, one
for each class, C-1 internal nodes. Each of the internal node is
associated with Bayesian classifier and linear feature extractor.
13.2 Adaptive Classification [8]: This method attempts to
overcome dimensionality curse, by iterative application of
MLC. The technique is a self-learning and self-improving
adaptive classifier to mitigate small training sample problem.
The iterative application of the classifier improves statistics
estimation leading to increased classification accuracy by using
already classified samples from the out put and original
training samples for subsequent estimation of statistics. The
classified samples are termed as semi-labeled samples.
Semi-labeled sample: *Samples whose class labels are decided
by a decision rule. These samples are unlabeled before
classification is performed. Semi-labeled sample label can be
either right or wrong."
Advantages of the techniques are:
Use of number of semi-labeled samples can improve statistics
estimation, thereby decreasing the estimation error and in turn
reducing the effect of one small sample size, as the semi-
labeled samples increase training sample size.
As semi- labeled samples are used to estimate statistics the
estimated statistics are more representative of true class
distribution.
The classifier uses the information extracted from its output,
hence adaptive, through a proper positive feed back, resulting
in better statistics estimation leading to higher classification
accuracy.
Adaptive nature of the classifier enables initialization with a
small number of training samples, greatly reducing analyst
effort.
The partial information conveyed by the semi-labeled samples
is used in such a way that each semi- labeled sample affects
statistics of that class into which it has been put and gives
reduced weight to semi-labeled samples to minimize the
undesired influence from misclassified samples.
13.3 Novel Pattern Detection using Neural Networks [2]:
Any supervised classifier (MLC,NN) assigns the pixels to the
class it resembles the most.
This situation does not happen often when using multispectral
data as the classes of interest are decided a priori and classifier
is trained to extract those features. In the case of Hyper spectral
data it is always possible that certain classes may not be
included in the training stage due to lack of sufficient training
data, which will usually be small classes. Novel pattern can be
described as a class not included in the training data.
IAPRS & SIS, Vol.34, Part 7, *Resource and Environmental Monitoring", Hyderabad, India,2002
53
Commonly used back propagation NN does not automatically
flag novel patterns as unknown. Rather it tries to classify each
pattern in to closest matching category. Novel detection can be
included in the back propagation architecture in the form of a
threshold by comparing the output of the unit showing highest
activation if the activation is less than he threshold the pattern
can be considered as novel.
Other way is to compute the difference between the output
pattern and each of the target patterns. If the minimum distance
is more than threshold then the input pattern can be considered
novel.
It is to be noted that back propogation network is a global
classifier and all the output values (positive and negative) are
learned simultaneously giving equal weightage to all. On the
other hand in PNN architecture the controlling parameter is the
smoothing parameter which is optimized for maximum
separation of all classes. If one maximum output value
calculated by the summation layer is less than a threshold then
the pattern can be considered as novel.
Probabilistic NN (PNN) is observed to perform better [2] in
novel pattern flagging and the author mentions that PNN is able
to detect higher percentage of novel patterns compared with
back propogation vector model.
13.4 Covariance Matrix estimation techniques[5]: When
applying MLC, mean vector and covariance matrix of each
class are estimated from training samples. For higher
dimensional data, if the sample size is small, covariance matrix
can become singular and not usable. Even if the matrix does not
become singular it may be a poor estimate,
If the number of training samples is limited, mean vector of
each class and common covariance matrix estimated for all
classes can some times lead to higher classification accuracies.
Cortijo et.al [12] have used common covariance and reported
higher classification accuracy.
Covariance matrix estimator which can examine mixtures of the
sample covariance matrix, common covariance matrix, diagonal
sample covariance matrix, diagonal common covariance matrix
and select the combination which can maximize the likelihood
of training samples not included in the covariance estimation
will be useful to mitigate dimensionality curse.
Such an estimator can be of the form
Ci (a;) = a, diag (2;) * 052,0; 5, 0;4 diag(S) (4)
Where 2; sample covariance matrix
S common covariance matrix calculated as average covariance
matrix
I/LZ XL number of classes
a;- [a 042 Q3 ou)” the mixing parameter.
Value of the mixing parameter a; is to be selected, such that
best fit to the training samples is achieved.
The mean and covariance matrix is estimated by after removing
a sample. The estimated parameters are used to compute the
likelihood of the left out sample. Each sample is removed in
turn and the average log likelihood is computed over all the left
out samples.