further reduction to the classification error can be
achieved by choosing a proper threshold t, and form
(9) that the metric Q and K also affect to the final bias.
The effect of the decision threshold can be optimized,
if the density functions of the underlying problem are
normal (Fukunaga, p. 329). For more general cases an
optimal t is impossible to achieve analytically. How-
ever, the following experimental procedure based on
empirical error estimation can be carried out. For each
value of k, find the threshold, which produces the
smallest experimental upper bound for the error. The
value of t, which corresponds to the smallest upper
bound can be chosen together with the corresponding k
to the final classification.
Another way to compensate the effect of the bias terms
is to consider the metric Q and the kernel shape K.
From (9) and (10) we can see that the terms, which are
independent of Nj, can be compensated, if o,(x)=0,(x).
Hence, for each x, one should find a solution to
Vp, (x) Vp, (x) (11)
tr z fr .
m ai en ©
This is extremely hard to obtain. Only in the case of
normal distribution a solution can be achieved, and if
additionally the covariance structures are similar, a
choice of Q.—X,, is reasonable. In the general case, if Q
is of second order, an approximate solution can be
accomplished by choosing (Fukunaga 1990, p. 337)
Q, - Z, * vx-u9G-uyY , (12)
where y, should be chosen to be a little larger than
1
Tr pe (13)
(x-p)(x- uj)
The effect of the metric is demonstrated in figure 1
(adopted from Fukunaga 1990), which shows how dra-
matic can the effect be on bigger kernel sizes.
0.5 Error rate
Q=1; +1; (Xu) (Xp)
0.4 4
0.3 -
0214.;,
NE
0.1
T I i 7 1 3
0.5: 10-19-25 20.025
k (kernel size)
Figure 1. The effect of the metric to the estimated error
rate, data Gaussian with unequal covariance matrices.
The kernel shape K is again a parameter to be choosed.
328
It was experimentally shown in (Fukunaga 1990) that
the more uniform type the kernel is, the more it will
affect the lower bound for the error, but have only
little effect on the upper bound. If Gaussian kernel is
used, it may happen that too much emphasis is put to
the centre of the kernel. A uniform kernel, like the
hyperball or hypercube, is another choice. It puts
equal emphasis throughout the neighbourhood. It is
quite evident that this produces many a posterior prob-
abilities in the border regions to be equal.
A computation intensive estimate of the Bayes error
can be achieved with the Parzen method from (10)
-d
Ne 14
Efè} « € + a,r? + ar « aur, da
After varying k, we can finally perform a least squares
fit using model (14) and from that have an estimate, Ë,
of the Bayes error &. Because it is reasonable to
believe that all the constants in (14) are positive, this
can be used as a constraint in the estimation procedure.
The fit should be repeated for each value of t, because
for (14) to be valid, a Bayesian decision should be
made in each case. The value of t, which gives the
lowest estimate of error should finally be chosen.
2.2.3 Voting kNN classifiers In the well-known
voting kNN procedure distances to the k nearest design
samples are computed and the pattern vector is
addressed to the class, which gets the majority of the
votes. If ties occur, a reject option can be used like in
the following analysis. This simple, though computing
intensive method has become appealing because of the
following result, which is achieved via the asymptotic
analysis (as Nye, P(&JX,,,)—P(6,X), see Devivjer
& Kittler 1982)
Se sus Sex usi. $206, 05)
where £y stands for the asymptotic error based on k
nearest neighbours and € stands for the Bayes error.
According to this asymptotic result very tight bounds
for the Bayes error can be achieved by applying a kNN
classifier, if k is large enough.
However, in practice we have to cope with finite sized
design sets and (15) will be heavily affected by the
corresponding bias. The simplified presumption
P(GIX,)-P(&JX) does not hold and the corresponding
risks will be biased. A detailed study of this bias in
the NN and 2NN cases can be found from (Fukunaga
& Hummels 1987a). They showed that a second order
approximation of the bias of the voting NN procedure
is
-1
en
E(A&,) 7 B,-E,|lQl ¢ r(QB,(x))[ ,
where
and
ture
ord
whe
Figu
The
botto
For
close
simil
prob
value
Equa
(10).
valid