ators,
lation
on of
from
3ayes
stitu-
table.
naller
rence
d the
trated
s not
e risk
reject
Tate
vorks
'ound
ng in
o the
0.02).
le-out
onal
1773
5
5.6
3.5
1.3
3.6
1.8
5.2
22
4.2
3.3
29
F5
26
1.6
1.4
).6
).8
methods work quite nicely, if the sample size is high
enough. However, the variance term is always about 2
times bigger than in risk averaging methods.
Parametric vs. nonparametric methods
Table 3 shows that the nonparametric method, if the
parameter tuning is properly performed (see chapter 2),
has potential also in the cases, where the optimal
classifier is simpler. The opposite is of course not
true. A linear classifier will never do proper work in
datacase NN. Especially in higher dimensional spaces
a successful application of a nonparametric classifier
presumes that experimental parameter tuning is per-
formed.
ınction
ges, m
case m=3 m-5 m-10
£ 6 £ 6 e 6
Linear 25.13 25.10 25.47
Nonparam. 27,13 40.10 25.08
Quadratic 33:12 .32 .09 .32 .08
Nonparam. 33.11 31 08 32 08
Table 3. Comparison of a nonparametric classifier to the
optimal ones in cases, where the asymptotically optimal
classifiers is either linear (II) or quadratic (14), d=8, N,=m*d.
In table 4 the robustness of the different type of
classifiers are compared. The dataset is contaminated
with outlying design samples, and the bias produced by
the contaminated data is shown. As predicted the
nonparametric methods are much more robust against
the outliers. A closer look to the results (not shown in
the table) reveals that the upper bound (leave-one-out
estimate) of the nonparametric method grows when
then percentage of outliers comes high, but the lower
bound grows only moderately (the lower bound of the
given example is 0.28 (£20.25) when 5096 of outliers
are present). In the parametric case both bounds grow
equally fast.
p À AË param. AË nonparam.
3 30 0.107 0.003
2530 0.194 0.036
50 30 0.206 0.084
28:58 0.060 0.038
Table 4. Robustness of classification methods against outliers,
dataset II, d=4, m=20, p=percentage of outliers, A-size of out-
liers (multiple of feature standard deviation).
However, the type of outliers affect to the robustness.
If the errors are labelling errors, also the nonparametric
methods are more affected. An example is given in
table 5. Again the lower bounds (resubstitution esti-
mate in this case) are only moderately biased, but the
upper bounds are strongly distorted.
Robustness of the error estimators
In table 6 the error estimators are compared with
333
p AE param. AË nonparam.
10 0.053 0.034
29 0.152 0.111
50 0.219 0.157
Table 5. Robustness of classification methods against outliers,
dataset II, d=4, m=20, p-percentage of labelling errors.
respect to the robustness. The numbers are again aver-
ages of the upper and lower bounds. For the risk
averaging and error reject also the lower bounds are
listed, because they are very robust against outliers.
The risk averaging method is extremely insensitive to
labelling errors in the design set of a parametric
classifier. As can be seen the bias is between 1 and 2
percent (true bayes error 2596). On the contrary, if the
errors are just outliers lying far away from the true
distribution, the risk-averaging tends to be strongly
optimistically biased. This is because the scatter will
spread out, but the unlabelled test set does not obey
that distribution. Both the upper and lower bounds
will be similarly biased.
In the nonparametric case, opposite to that of the para-
metric case, risk averaging is robust against outliers,
but pessimistically biased in case of labelling errors.
The same applies for the method utilizing the error-
reject tradeoff. This is because of the upper bound.
The lower bound is extremely robust in the nonpara-
metric case. This phenomenon can be taken into ad-
vantage. If the difference between the upper and lower
is big compared to the difference between the tradi-
tional error counting and the lower bound, the design
set contains labelling errors, and the lower bound can
be used for prediction.
Case Ag, Agg, Lm
Parametric
p=10 0.05 0.01
p=29 0.15 0.02
p=50 0.22 0.01
p-5 A-30 0.11 -0.11
p=25 A-30 0.20 -0.13
p=50 A=30 0.21 -0.18
Nonparam.
p=10 0.04 0.03 0.06
-0.04 0.01
p=29 0.12 0.10 0.15
-0.01 0.08
p=50 0.16 0.12 0.18
0.00 0.10
p=5 A=30 0.01 0.01 0.02
-0.03 -0.02
p=25 A=30 0.04 0.03 0.03
0.00 -0.01
p=50 A=30 0.09 0.04 0.04
0.00 0.00
Table 6. Robustness of different error estimators, p=% of
outliers, A=size of outliers, labelling errors if none, EC=error
counting, RA=risk averaging, Er=Error reject.