2.1 Statistical Pattern Recognition
Statistical pattern recognition techniques are char-
acterized by looking at patterns as high dimen-
sional random variables. Decomposition into parts
or structural properties are not taken into account.
If such classifiers shall be applied on complex scenes,
segmentation is required. Then, each resulting area
can be classified as an entire unit. It is mapped
onto a class as one entity. The architecture of such
a classification system is outlined in Fig. 1. Given
the measurement of an entity to be classified a fea-
ture vector c is calculated. This point in the N-
dimensional feature space is the argument of a de-
f(x) > preprocessing
Clot 8 ®
] f
| |
| ; a feature extraction |- e classification
| segmentation | | |
i J L
no
Figure 1: Architecture of a Classification System
cision function. We assume that the feature vectors
are already available and we concentrate on the de-
cision function. The task is to construct a mapping
from the feature space into the set of indices char-
acterizing the classes 2x. The decision function is
denoted by
DO =K KEN (2)
In order to optimize this function with respect to
a given learning sample it is convenient to use a
decision vector in the following way
d(c) = (di(c)..... dx(c))
K
and > dp(e) ="; (3)
ki
The choice of an optimization criterion determines
the functions di. Both classical approaches, i.e.
minimizing a cost function and approximation of the
perfect decision function, will be outlined.
To minimize the costs of decisions it is required that
the density functions p(c|@2x), the a priori probabil-
ities pr, and the pairwise error classification costs
rel, 0 < rer «€ rjj € 1 are known. rj; denotes the
costs of a classification of a pattern belonging to
class / into class k. The average costs evoked by the
decision function are therefore given by
K K
V(d) = > oY mu | plein) (Ode (4)
kd li
The costs are minimal if the decision function is cho-
sen to
K
JA if kzargmini( Y rijpjp(c | 85)) ,
dy(c) — j=1
0 otherwise
D(c) — argmazy(dx(c)) . (5)
712
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B3. Vienna 1996
Based on this general decision optimization, special
variants of classifiers can be derived. Restricting the
costs to ry, — O0 and rj; — 1,1 k the Bayes classi-
fication rule of maximum a posteriori probability is
given. Fixing p(c|Qx) to be Gaussian results in the
normal distribution decision rule.
The perfect decision function
&9)-1 1 ife) (6)
0 otherwise
is approximated by a polynomial decision rule. This
function is to be defined according to the learning
sample. The decision functions dy approximating
the perfect ones make use of polynomial expansions
of the feature vectors c. Given an arbitrary but fixed
polynomial expression x(c) over the coefficients of c
the decision functions are expressed by
dy (c) =afx(c) . (7)
Rewriting in vectorial form leads to
d(c) = A" x(c) . (8)
Learning or adjusting the classification rule is equiv-
alent to the estimation of the parameter matrix A.
According to the Weierstrass Theorem, arbitrary
functions can be approximated where the accuracy
only depends on the degree of the polynomial x(c).
The optimal matrix A* is that one minimizing the
error between the perfect and the estimated decision
rule. Therefore, it has to fulfill the criterion
e(A*) — min E((6(c) — ATx(c))?} . (9)
A closed form solution can be achieved resulting in
the simple scheme
* 1 = T4—1 1 = T:
AS x(es)x(es)T) (47 x(es)(e;)"). (10)
The only assumption is that the matrix to be in-
verted is not singular. But this is not a serious prob-
lem if a representative learning sample and therefore
a sufficient number of feature vectors and their cor-
responding classes are available.
Both classification rules depend on the learning sam-
ple. The parameters of the decision rule results
from an optimization process. Above, an off-line
estimation has been presented. Nevertheless, there
exist recursive estimation procedures for both ap-
proaches. They can be applied in à supervised or
un-supervised training. In the latter case, a suf-
ficient initial estimation is required. An incoming
feature vector is classified according to the present
parameters. The result is used to update the param-
eters of a certain class. This class can be the result
of classification or can be achieved by a randomized
decision which must take into account the values of
the decision vector d(c). It should be pointed out,
that both approaches are optimal with respect to
the chosen criteria. The semantics of a domain is
cis
of
sin
inf
in
sys
of :
pre
tue
pei
ens
dez
Syr
twe
Spe
uni
tiv:
enm
Syr
are
wit
val
thi
be
ral
fine
lidi
wh
ize
Its
Thi
If z
anc
uni
dire
noc
ass
ron
sta
isr
whe
den
maj