XVIIth ISPRS Congress: XVIIth ISPRS Congress

fritz, lawrence w.; lucas, james r.
  
  
  
  
  
Oo 
0.150 
0.125 À 
0.100 
0.075 
0.050 
0.025 
ri T T ım 
35 10 20 50 
A - AResubstitution X---X Leave-one-out &-—A Ordered sets 
© - © Holdout D—ElRisk averaging R 
Q--OBootstrap-b632 — W—WRisk averaging H 
  
  
  
Figure 4. Standard deviations of the different error 
estimators, data II, m=d*N,. 
The results mostly agree with what was expected. The 
variances of the risk averaging methods are smaller 
than the variances of the error counting methods, espe- 
cially in small sample size situations. This result is 
clearly in favour of the risk averaging in small sample 
situations, where the variance term mostly dominates. 
The upper bounds in both cases have a bigger variance 
than the lower bounds as expected. The use of the 
error reject tradeoff is comparable with risk averaging, 
but the difference between the upper and lower bounds 
(not shown in figures) is extremely big in small sample 
size situations. E.g. in dataset NN for the two smallest 
sample sizes the upper and lower bounds are 0.17 vs. 
0.35 and 0.17 vs. 0.28, respectively. The convergence 
to the asymptotic case is slower than could be 
expected, which might come from the constant bias 
term of (20). However the mean of these bounds quite 
well predicts the error rate and the variance is nearly 
comparable to that of risk averaging. 
  
  
  
  
  
€ 
0.304 
0.254 
0.20- 
0.154 
0.10 
Ax - AResubstitution X----X Leave-one-out A-—À Ordered sets 
® - 6 Holdout D—ElRisk averaging R 
  
Q--GOBootstrap-b632 — WI—WRisk averaging H 
  
  
Figure 5. Estimated error rates of different estimators, 
dataset NN, dashed line - Bayes error, Ny-m*d. 
One comment concerning the nonparametric results. 
The tuning of the kernel size of the nonparametric 
classifier was done in too rough a quantization in case 
of small sample sizes. That is the probable reason why 
some of the curves (e.g. resubstitution curve) do not 
behave smoothly. The bias term still dominates in 
some cases. 
332 
  
  
  
  
  
Oo 
0.154 
0.104 
0.054 
rT T T | m 
35 10 20 50 
Ax - A Resubstitution X---X Leave-one-out &-—À Ordered sets 
© - 6 Holdout [—ElRisk averaging R 
Q--OBootstrap-b632 — W—lWRisk averaging H 
  
  
  
Figure 6. Standard deviations of different estimators, 
Np=m*d. 
In table 2 the different types of error estimation 
methods are compared with each other as a function of 
the Bayes error. All results are average values from 
the lower and upper bounds of the estimated Bayes 
error (e.g. error counting = Y[leave-one-out+resubstitu- 
tion]). The following can be observed from the table. 
a) Risk averaging methods have dominantly smaller 
variance than the traditional methods. The difference 
is the bigger the smaller is the Bayes error and the 
smaller is the sample size. b) As has been illustrated 
in many simulations, bootstrapping method does not 
perform well in low error rate situations. c) The risk 
averaging methods and the method using error reject 
tradeoff are pessimistically biased in low error rate 
situations. These methods (also bootstrapping) works 
better under such circumstances, if the lower bound 
only is used (e.g. the lower bound for risk-averaging in 
the 0.01 error rate case with 5*d samples equals to the 
correct value 0.01, but the upper bound claims 0.02). 
The effect could be corrected by using a leave-one-out 
type procedure for the upper bound. d) The traditional 
  
  
Method Bayes error m=5 m=10 
£ & 6 & 6 
Error counting 25.1 25.6 10.6 25.4 6.6 
Bootstrapping 27.3 9.9 26.2 6.5 
Risk averag. 23.8 6.0 23.1 4.3 
Error reject 26.1 7.8 24.7 6.6 
Error counting 10.0 9.7 6.3 10.4 4.8 
Bootstrapping 11.8 6.7 10.6 5.2 
Risk averag. 9.9 3.0 9.8 2.2 
Error reject 14.5 5.9 11.8 42 
Error counting 5.0 49 5.0 5.0 33 
Bootstrapping 6.4 48 5.5 2.9 
Risk aver. 5522 49 1.5 
Error reject 75 33 5.8 2.6 
Error counting 1.0 09 23 1.0 1.6 
Bootstrapping 1.6 23 1.0 14 
Risk aver. 15:14 1.2 0.6 
Error reject 17:12 14 0.8 
  
  
  
  
  
  
Table 2. Comparison of error counting methods as a function 
of the separability between classes, all numbers in percentages, m 
stands for sample size (m*d=N,), d=8, linear classifier. 
metl 
enou 
time 
Parc 
Tabl 
parai 
has 
class 
data 
a Su 
presi 
form 
  
  
Tabl 
optim 
classi 
In t 
class 
with 
the . 
nonp 
the c 
the t 
estin 
then 
boun 
givei 
are [ 
equa 
  
  
  
  
Tabl 
datas 
liers | 
How 
If th 
meth 
table 
mate 
uppe 
Robi 
In t
1
2
...
341
342
343
344
345
...
1000
1001
Full text: XVIIth ISPRS Congress (Part B3)

Access restriction

Copyright

Note to user