mostly theoretical results, which have been achieved
until now in statistical pattern recognition. Chapter
three discusses in detail about the methods of empirical
error estimation. Altogether ten methods are discussed.
The chapter can be considered as a review of empirical
error estimation, and utilizes both theoretical and empi-
rical results from the literature. In chapter four, the
simulation results are described. The simulations are
based on an extensive work. Totally, more than 80000
different cases have been studied (yet the test is
limited). The most important trends of these simula-
tions are listed in chapter four. Finally, chapter five
will draw some conclusions.
2. ERROR ESTIMATION AND CLASSIFIER
DESIGN
In this chapter we review the effect of finite sample
sizes to the empirical error estimators and to the
classifier design. In the analysis below, like in the
simulations in chapter 4, we will restrict to two class
cases.
2.1 Effect of finite sample sizes to empirical error
estimation
The expected performance of a classifier degrades
because of two sources: the finite number of samples
used to design the classifier and the finite number of
samples used to test the classifier. A theoretical analy-
sis about the effects of both of these can be found from
(Fukunaga 1990).
The effect of the finite number of test samples in the
error counting approach can be directly derived from
the binomial distribution
E,(ê} = €
P, P,
Var {€} = [I jade ,
(4)
where E,(£) is the expected value and Var,(£) the
variance of the error estimate, € is the true error rate, €,
is the true error rate of class 1, P, and P, are the prior
probabilities and N, and N, are the sample sizes for
both classes. The finiteness of the test set does not
affect to the bias of the estimate, but produces a vari-
ance, which is the higher the smaller is the expected
error rate.
The effect of a finite design set is much more difficult
to analyze and the derivation goes far beyond the scope
of this paper. The interested reader can find a detailed
derivation from (Fukunaga 1990, p. 201-214). It is
shown that the bias produced by a finite design set is
always positive and the variance of second order app-
roximation of a Bayesian classifier (assuming correct
probability model is used) is zero. If the classifier is
not Bayesian or higher order terms are used in the
analysis, the variance is not anymore zero, and is de-
pendent on the underlying density structures being
326
proportional to 1/N3.
When considering the effect of independent test and
design sets, the following may thus be concluded: The
bias comes mainly from the finite design set, and the
variance from the finite test set.
2.2 Effect of finite sample sizes in Classifier
design
There is a large variety of classification rules. We
consider here only those, which we have used in our
simulations. The primary concern is the bias produced
by the finite design set, because the variance of the
error estimate comes primarily from the test set. Also
the robustness against outliers is considered.
2.2.1 Parametric Classifiers If the density functions
can be expressed in parametric form, corresponding
classifiers are called parametric. Most often the den-
sity functions are described with the help of first and
second order moments. Depending on the assumptions
made, the decision boundaries are either of linear
(linear classifier, equal covariance matrices) or of
quadratic form (quadratic classifier, different
covariance matrices).
In the simulations carried out, we have used classifiers
based on the assumption of multivariate normal dis-
tribution. The classifiers are known to be asymptoti-
cally Bayesian, if normality assumption is valid. In
this case, the effect of the finite design set can be
analyzed theoretically (Fukunaga 1990, chapter 5).
The drift from the validity of the normality assumption
(modelling error) is harder to analyze. The effect of
this drift was analyzed by simulation during this pro-
ject, but this part is not reported here.
If the covariance matrices in both classes are equal to
the identity matrix, an explicit formula for the bias
caused by the finite design set can be derived. This is
of interest to have some kind of feeling about the de-
pendencies. For linear classifier the bias is (Fukunaga
1990, p. 211)
ELE) - E, (2)
p{Ve} N,
where
E
So e ® »
VLS
2V2r pm
c P. jai sina)
Correspondingly, the effect of the final design set to
the quadratic classifier in this case is
where
in C
dimer
betwe
(5) it
propo
classi
classi
whicl
featur
effect
not v
stant
classe
expec
have.
2.2.
nonp:
do n
struct
are m
heavil
They
optim
ly im
estim:
The d
1990,
A noi
the sc
sifier
of a d
where
the ke
XS al
ifier c
sion tl
variab
A sec
and v
shown