XVIIIth Congress: XVIIIth Congress

kraus, karl; waldhäusl, peter
   
    
   
     
    
   
    
   
     
   
    
   
  
    
    
   
  
  
    
    
      
     
     
   
  
  
    
   
     
    
   
   
    
2.1 Statistical Pattern Recognition 
Statistical pattern recognition techniques are char- 
acterized by looking at patterns as high dimen- 
sional random variables. Decomposition into parts 
or structural properties are not taken into account. 
If such classifiers shall be applied on complex scenes, 
segmentation is required. Then, each resulting area 
can be classified as an entire unit. It is mapped 
onto a class as one entity. The architecture of such 
a classification system is outlined in Fig. 1. Given 
the measurement of an entity to be classified a fea- 
ture vector c is calculated. This point in the N- 
dimensional feature space is the argument of a de- 
  
f(x) > preprocessing 
Clot 8 ® 
  
] f 
| | 
| ; a feature extraction |- e classification 
| segmentation | | | 
i J L 
  
no 
Figure 1: Architecture of a Classification System 
cision function. We assume that the feature vectors 
are already available and we concentrate on the de- 
cision function. The task is to construct a mapping 
from the feature space into the set of indices char- 
acterizing the classes 2x. The decision function is 
denoted by 
DO =K KEN (2) 
In order to optimize this function with respect to 
a given learning sample it is convenient to use a 
decision vector in the following way 
d(c) = (di(c)..... dx(c)) 
K 
and > dp(e) ="; (3) 
ki 
The choice of an optimization criterion determines 
the functions di. Both classical approaches, i.e. 
minimizing a cost function and approximation of the 
perfect decision function, will be outlined. 
To minimize the costs of decisions it is required that 
the density functions p(c|@2x), the a priori probabil- 
ities pr, and the pairwise error classification costs 
rel, 0 < rer «€ rjj € 1 are known. rj; denotes the 
costs of a classification of a pattern belonging to 
class / into class k. The average costs evoked by the 
decision function are therefore given by 
K K 
V(d) = > oY mu | plein) (Ode (4) 
kd li 
The costs are minimal if the decision function is cho- 
sen to 
K 
JA if kzargmini( Y rijpjp(c | 85)) , 
dy(c) — j=1 
0 otherwise 
D(c) — argmazy(dx(c)) . (5) 
  
712 
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B3. Vienna 1996 
Based on this general decision optimization, special 
variants of classifiers can be derived. Restricting the 
costs to ry, — O0 and rj; — 1,1 k the Bayes classi- 
fication rule of maximum a posteriori probability is 
given. Fixing p(c|Qx) to be Gaussian results in the 
normal distribution decision rule. 
The perfect decision function 
&9)-1 1 ife) (6) 
0 otherwise 
is approximated by a polynomial decision rule. This 
function is to be defined according to the learning 
sample. The decision functions dy approximating 
the perfect ones make use of polynomial expansions 
of the feature vectors c. Given an arbitrary but fixed 
polynomial expression x(c) over the coefficients of c 
the decision functions are expressed by 
dy (c) =afx(c) . (7) 
Rewriting in vectorial form leads to 
d(c) = A" x(c) . (8) 
Learning or adjusting the classification rule is equiv- 
alent to the estimation of the parameter matrix A. 
According to the Weierstrass Theorem, arbitrary 
functions can be approximated where the accuracy 
only depends on the degree of the polynomial x(c). 
The optimal matrix A* is that one minimizing the 
error between the perfect and the estimated decision 
rule. Therefore, it has to fulfill the criterion 
e(A*) — min E((6(c) — ATx(c))?} . (9) 
A closed form solution can be achieved resulting in 
the simple scheme 
* 1 = T4—1 1 = T: 
AS x(es)x(es)T) (47 x(es)(e;)"). (10) 
The only assumption is that the matrix to be in- 
verted is not singular. But this is not a serious prob- 
lem if a representative learning sample and therefore 
a sufficient number of feature vectors and their cor- 
responding classes are available. 
Both classification rules depend on the learning sam- 
ple. The parameters of the decision rule results 
from an optimization process. Above, an off-line 
estimation has been presented. Nevertheless, there 
exist recursive estimation procedures for both ap- 
proaches. They can be applied in à supervised or 
un-supervised training. In the latter case, a suf- 
ficient initial estimation is required. An incoming 
feature vector is classified according to the present 
parameters. The result is used to update the param- 
eters of a certain class. This class can be the result 
of classification or can be achieved by a randomized 
decision which must take into account the values of 
the decision vector d(c). It should be pointed out, 
that both approaches are optimal with respect to 
the chosen criteria. The semantics of a domain is 
     
cis 
of 
sin 
inf 
in 
sys 
of : 
pre 
tue 
pei 
ens 
dez 
Syr 
twe 
Spe 
uni 
tiv: 
enm 
Syr 
are 
wit 
val 
thi 
be 
ral 
fine 
lidi 
wh 
ize 
Its 
Thi 
If z 
anc 
uni 
dire 
noc 
ass 
ron 
sta 
isr 
whe 
den 
maj
1
2
...
733
734
735
736
737
...
1080
1081
Full text: XVIIIth Congress (Part B3)

Access restriction

Copyright

Note to user