Full text: Resource and environmental monitoring (A)

  
   
     
   
    
  
  
   
   
      
   
  
   
  
   
   
   
   
  
  
  
  
  
  
  
   
  
  
  
  
  
  
  
  
  
  
   
  
  
  
   
  
  
  
  
    
   
    
  
   
  
   
     
   
    
  
   
   
     
      
   
   
cal optimization of 
dimensional linear 
. Projection pursuit 
pjection from high 
mizing a function 
ize the separability 
sional space. The 
either in Euclidean, 
ranner. Originally 
ged such that the 
cted samples 
(3) 
projection index I 
ng are 
yrocessor [7] 
| 
n.[9] Application of 
eviously is not 
ds. Two approaches 
ality: one approach 
yeable small feature 
ther approach is to 
tion of decision tree 
atures are not very 
1ensionality manual 
cult. The alternative 
domain knowledge 
Modular learning is one such approach in which number of 
classifiers addresses specific issue of the problem are learnt 
instead of a single classifier. The hierarchical multiclassifier 
system proposed by [5] is based on the concept of modular 
learning. 
“Set v of C classes are divided into two sub classes referred as 
meta classes. Linear feature extractor the best discriminates the 
two meta classes and the class division itself is learnt 
automatically. Meta classes are sub divided recursively until 
the resulting meta classes have only one of the C original 
classes. The resultant binary tree will have C leaf nodes, one 
for each class, C-1 internal nodes. Each of the internal node is 
associated with Bayesian classifier and linear feature extractor. 
13.2 Adaptive Classification [8]: This method attempts to 
overcome dimensionality curse, by iterative application of 
MLC. The technique is a self-learning and self-improving 
adaptive classifier to mitigate small training sample problem. 
The iterative application of the classifier improves statistics 
estimation leading to increased classification accuracy by using 
already classified samples from the out put and original 
training samples for subsequent estimation of statistics. The 
classified samples are termed as semi-labeled samples. 
Semi-labeled sample: *Samples whose class labels are decided 
by a decision rule. These samples are unlabeled before 
classification is performed. Semi-labeled sample label can be 
either right or wrong." 
Advantages of the techniques are: 
Use of number of semi-labeled samples can improve statistics 
estimation, thereby decreasing the estimation error and in turn 
reducing the effect of one small sample size, as the semi- 
labeled samples increase training sample size. 
As semi- labeled samples are used to estimate statistics the 
estimated statistics are more representative of true class 
distribution. 
The classifier uses the information extracted from its output, 
hence adaptive, through a proper positive feed back, resulting 
in better statistics estimation leading to higher classification 
accuracy. 
Adaptive nature of the classifier enables initialization with a 
small number of training samples, greatly reducing analyst 
effort. 
The partial information conveyed by the semi-labeled samples 
is used in such a way that each semi- labeled sample affects 
statistics of that class into which it has been put and gives 
reduced weight to semi-labeled samples to minimize the 
undesired influence from misclassified samples. 
13.3 Novel Pattern Detection using Neural Networks [2]: 
Any supervised classifier (MLC,NN) assigns the pixels to the 
class it resembles the most. 
This situation does not happen often when using multispectral 
data as the classes of interest are decided a priori and classifier 
is trained to extract those features. In the case of Hyper spectral 
data it is always possible that certain classes may not be 
included in the training stage due to lack of sufficient training 
data, which will usually be small classes. Novel pattern can be 
described as a class not included in the training data. 
IAPRS & SIS, Vol.34, Part 7, *Resource and Environmental Monitoring", Hyderabad, India,2002 
53 
Commonly used back propagation NN does not automatically 
flag novel patterns as unknown. Rather it tries to classify each 
pattern in to closest matching category. Novel detection can be 
included in the back propagation architecture in the form of a 
threshold by comparing the output of the unit showing highest 
activation if the activation is less than he threshold the pattern 
can be considered as novel. 
Other way is to compute the difference between the output 
pattern and each of the target patterns. If the minimum distance 
is more than threshold then the input pattern can be considered 
novel. 
It is to be noted that back propogation network is a global 
classifier and all the output values (positive and negative) are 
learned simultaneously giving equal weightage to all. On the 
other hand in PNN architecture the controlling parameter is the 
smoothing parameter which is optimized for maximum 
separation of all classes. If one maximum output value 
calculated by the summation layer is less than a threshold then 
the pattern can be considered as novel. 
Probabilistic NN (PNN) is observed to perform better [2] in 
novel pattern flagging and the author mentions that PNN is able 
to detect higher percentage of novel patterns compared with 
back propogation vector model. 
13.4 Covariance Matrix estimation techniques[5]: When 
applying MLC, mean vector and covariance matrix of each 
class are estimated from training samples. For higher 
dimensional data, if the sample size is small, covariance matrix 
can become singular and not usable. Even if the matrix does not 
become singular it may be a poor estimate, 
If the number of training samples is limited, mean vector of 
each class and common covariance matrix estimated for all 
classes can some times lead to higher classification accuracies. 
Cortijo et.al [12] have used common covariance and reported 
higher classification accuracy. 
Covariance matrix estimator which can examine mixtures of the 
sample covariance matrix, common covariance matrix, diagonal 
sample covariance matrix, diagonal common covariance matrix 
and select the combination which can maximize the likelihood 
of training samples not included in the covariance estimation 
will be useful to mitigate dimensionality curse. 
Such an estimator can be of the form 
Ci (a;) = a, diag (2;) * 052,0; 5, 0;4 diag(S) (4) 
Where 2; sample covariance matrix 
S common covariance matrix calculated as average covariance 
matrix 
I/LZ XL number of classes 
a;- [a 042 Q3 ou)” the mixing parameter. 
Value of the mixing parameter a; is to be selected, such that 
best fit to the training samples is achieved. 
The mean and covariance matrix is estimated by after removing 
a sample. The estimated parameters are used to compute the 
likelihood of the left out sample. Each sample is removed in 
turn and the average log likelihood is computed over all the left 
out samples.
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.