)) that
it will
> only
rnel is
put to
ke the
t puts
It is
' prob-
> Error
))
(14)
quares
iate, &,
ble to
e, this
'edure.
ecause
uld be
'es the
known
design
tor is
of the
like in
puting
of the
nptotic
>vivjer
(15)
] on k
error.
younds
a kNN
> sized
by the
nption
onding
Dias in
cunaga
| order
cedure
(16)
i d+2 2
d .
f | 2 |t NS x gn
dn D
p,
and B, is a function depending on the probability struc-
tures and the metric used. Correspondingly, the second
order approximation of the 2NN bias is
d 2| (18)
E{A@ nn} = B, sila ? tr(QB,(x)) | ;
where
2 2
r4 q:2 T d * tel 4
2 d dj 4/7 (19)
B, = 2 * Np
an {1 5)
and B, is again dependent on the probability structures
of the underlying problem.
The effect of the sample size to the bias is dependent
only on the constants B, and B,. The bias seems to be
very difficult to compensate by increasing the sample
size. Luckily the bias of the 2NN case is much
smaller than in the NN case. This can be seen from
figure 2. This agrees with the fact that the NN method
seems to give reasonable result in practice only in the
lower dimensional cases.
p e - * q.NN-classifier
0.060 T
0.050 Slide. se boasts Zi ;
0.040 S me d teu euro Co NP .
0.030 fes eni VER TTC 2
0.020 WEL Mem. 41
0.010 t e. wi e. IE
EZ qe IN
T
100 1000 10000 100000 1000000
p 4-——4 b.NN2-classifier
0.004
0nd. rate A
0.002 Men te TT SE es :
0 .00 1 es le pis Er Tm Wt à
Tp pun pp
100 1000 10000 100000 1000000
Figure 2. The bias terms of the NN and 2NN classifiers.
The feature space dimension varies from 2 to 128 from
bottom to top, respectively.
For larger values of k, the kNN procedure reminds
closely the Parzen procedure. Correspondingly it gives
similar results in all aspects of the error estimation
problem (Fukunaga & Hummels, 1987b). For large
values of k, the bias can be expressed by
Equation (20) has a remarkable similarity with equation
(10). The a, term is missing, because (20) is only
valid for large values of k and the a, term vanishes
329
b 2 4
E(AE) = = + hr * a F3; -9:A.
(20)
when k is increasing. The constant term b,/k is the
only clear exception from (10). It shows that the kNN
procedure converges to the Bayes error only for large
values of k, even in the asymptotic case. A serious
disadvantage of the voting kNN procedure is the fact
that there is no possibility to adjust the decision thresh-
old for reducing the bias.
2.2.4 Robustness of the classifiers The unbiasness
of the estimated parameters of parametric density func-
tions is shown in every statistical textbook. The effect
of outlying observations has been a primary concern
during the last decade (see e.g. Huber 1981). For
parametric classifiers the traditional estimation tech-
niques have been shown to be extremely sensitive to
outliers, the breakdown point being asymptotically zero
(a single outlier can cause the system to break down).
However, the concept of the breakdown point is a little
misleading, because it analyzes only the worst case.
The robustness is naturally a function of the size and
of the amount of outliers. A theoretical analysis of
these effects is hard to carry out. To find out the
dependence of the estimated error rate against these
parameters, a simulation study is a feasible choice. We
will shortly return to this subject in chapter 4.
In case of the nonparametric methods the outliers are
not so serious, because only some local neighbourhood
is used for density estimation. If none of the outliers
does not belong to this neighbourhood, no damage will
be created. The situation is worse in the case of
labelling errors in the design set, because in that case
remarkable violations will occur also in the non-para-
metric case. We will simulate both these cases in
chapter 4.
Because we are more interested in non-parametric
classifiers due to their generality, we will not concen-
trate in this context to the parametric robust estimation
techniques.
3. EMPIRICAL ERROR ESTIMATION
In this context we classify the empirical error estima-
tion methods into three categories: error counting
methods, methods based on risk averaging and methods
based on the error reject tradeoff. In this chapter we
will shortly review the methods, which we have used
in the simulations of chapter 4.
3.1 Error counting methods
Error counting methods have been traditionally used in
pattern recognition for estimating the error rate. Many
variates exists, from which we have used four in this
context. The error counting estimate is given by