erated
times
. final
l esti-
| from
empi-
good
te. In
e two
332 to
1990,
iethod
ne-out
r, and
iethod
) the
when
itions.
0.632
e NN
lly to
m the
esting
large
ethod
cause
e test
aver-
d the
esti-
(22)
which
times
f this
with
cially
risks
algo-
r 10)
f the
(23)
nting
(compare to (1)).
3.2.1 Direct ("resubstitution") estimate A direct
estimate will be formed by directly evaluating formula
(22) for each test sample. Because of the design
phase, the bias and variance of the estimate € must be
taken into account. This will effect the final estimate
to be optimistically biased and the direct estimate thus
gives a lower bound for the error.
3.2.2 Reference set ("holdout") estimate For
getting an upper bound (and thus an estimate as the
average) we must have another labelled set, which is
now called the reference set. When this estimate is
computed, the classification is taken from the design
set, but the risk is computed with the help of the refer-
ence set. It is shown in (Devivjer & Kittler 1982) that
this estimate is asymptotically unbiased and has a
variance, which is in the order of (23).
3.3 Methods based on error-reject tradeoff
If a reject option is included to the classification sys-
tem, the error rate of the classifier reduces, because the
reject option discards those pattern vectors, whose
classification has a high risk. The dependence between
the reject and error rates is (see e.g. Devivjer & Kittler
1982)
A,
e(à,) = - [A-dR(A), (24)
0
where R is the reject rate and A, is the rejection thresh-
old (if risk is greater than A, reject decision is made).
Thus observing the rejection rate and varying A, an
estimate of £ can be computed.
These methods have all the advantages of the methods
of the previous chapter, especially the one that an
unlabelled test set can be used.
3.3.1 A method utilizing ordered sets We have
used a special version of error-reject tradeoffs, which is
specially designed for the voting knn procedures. It is
shown in (Devivjer & Kittler 1982, chapter 3) by
asymptotic analysis that the following bounds can be
formed for the error rate
yom Ay, Yu Ay, «dA 5 (25)
2i-1 Mel 2 Aa >
where A,, Means the acceptance rate of kNN classifier
with exactly s=i votes. (25) inherently measures the
asymptotic probability of ties, which occur in 2iNN
classifications and takes a series expansion of all of
them.
Unfortunately the bias produced by the finite design
331
sets (see (20)) is heavily deteriorating the
asymptotically tight bounds of (25). However, it is
hoped that the average of the upper and lower bounds
will produce good results.
4. SIMULATION RESULTS
An extensive simulation work has been carried out.
The primary goals of these simulations was: a) to com-
pare risk averaging methods with error counting
methods, b) to compare nonparametric and parametric
classifiers, c) to test the robustness of the different
error estimation methods and d) to get at least an idea
about a classification system, which is general and
gives as reliable error estimates as possible in all cir-
cumstances. Although the simulation study is exten-
sive, it is still limited in many respects. This is un-
avoidable, because of the many parameters to be tested.
The generated datasets are listed on table 1.
Data ID Optimal Bayes error rate
Classifier
II Linear 0.251
14 Quadratic 0.324
NN Nonparametric 0.128
Table 1. The three types of data used in the simulations.
The datasets II and I4 are generated from normal dis-
tribution and the dataset NN from a mixture normal
distribution. Unless otherwise stated the given Bayes
error rates are the ones listed in table 1. The Bayes
error rate of 0.128 for dataset NN represent the case
when the generated dataset differs maximally from
normal distribution. All simulations carried out are
two class cases.
Risk averaging methods vs. error counting
As an example, the estimated error rates and standard
deviations of the different error estimators for datasets
II and NN in case of a two dimensional feature space
are presented in Figures 3-6.
€
0.30-
0.254
0.204
m T T €——
35 10 20 50
Ax - AResubstitution XX Leave-one-out &-—4A Ordered sets
9 - 6 Holdout [3—ElRisk averaging R
Q--OBootstrap-b632 — B—WRisk averaging H
Figure 3. The estimated error rates of different estima-
tors. Data II, Bayes error equals the dashed line,