XVIIth ISPRS Congress: XVIIth ISPRS Congress

fritz, lawrence w.; lucas, james r.
  
  
ators, 
lation 
on of 
from 
3ayes 
stitu- 
table. 
naller 
rence 
d the 
trated 
s not 
e risk 
reject 
Tate 
vorks 
'ound 
ng in 
o the 
0.02). 
le-out 
onal 
1773 
5 
5.6 
3.5 
1.3 
3.6 
1.8 
5.2 
22 
4.2 
3.3 
29 
F5 
26 
1.6 
1.4 
).6 
).8 
methods work quite nicely, if the sample size is high 
enough. However, the variance term is always about 2 
times bigger than in risk averaging methods. 
Parametric vs. nonparametric methods 
Table 3 shows that the nonparametric method, if the 
parameter tuning is properly performed (see chapter 2), 
has potential also in the cases, where the optimal 
classifier is simpler. The opposite is of course not 
true. A linear classifier will never do proper work in 
datacase NN. Especially in higher dimensional spaces 
a successful application of a nonparametric classifier 
presumes that experimental parameter tuning is per- 
formed. 
  
  
  
ınction 
ges, m 
  
  
  
  
case m=3 m-5 m-10 
£ 6 £ 6 e 6 
Linear 25.13 25.10 25.47 
Nonparam. 27,13 40.10 25.08 
Quadratic 33:12 .32 .09 .32 .08 
Nonparam. 33.11 31 08 32 08 
  
  
Table 3. Comparison of a nonparametric classifier to the 
optimal ones in cases, where the asymptotically optimal 
classifiers is either linear (II) or quadratic (14), d=8, N,=m*d. 
In table 4 the robustness of the different type of 
classifiers are compared. The dataset is contaminated 
with outlying design samples, and the bias produced by 
the contaminated data is shown. As predicted the 
nonparametric methods are much more robust against 
the outliers. A closer look to the results (not shown in 
the table) reveals that the upper bound (leave-one-out 
estimate) of the nonparametric method grows when 
then percentage of outliers comes high, but the lower 
bound grows only moderately (the lower bound of the 
given example is 0.28 (£20.25) when 5096 of outliers 
are present). In the parametric case both bounds grow 
equally fast. 
  
  
p À AË param. AË nonparam. 
3 30 0.107 0.003 
2530 0.194 0.036 
50 30 0.206 0.084 
28:58 0.060 0.038 
  
  
  
  
  
Table 4. Robustness of classification methods against outliers, 
dataset II, d=4, m=20, p=percentage of outliers, A-size of out- 
liers (multiple of feature standard deviation). 
However, the type of outliers affect to the robustness. 
If the errors are labelling errors, also the nonparametric 
methods are more affected. An example is given in 
table 5. Again the lower bounds (resubstitution esti- 
mate in this case) are only moderately biased, but the 
upper bounds are strongly distorted. 
Robustness of the error estimators 
In table 6 the error estimators are compared with 
333 
  
  
p AE param. AË nonparam. 
10 0.053 0.034 
29 0.152 0.111 
50 0.219 0.157 
  
  
  
  
  
Table 5. Robustness of classification methods against outliers, 
dataset II, d=4, m=20, p-percentage of labelling errors. 
respect to the robustness. The numbers are again aver- 
ages of the upper and lower bounds. For the risk 
averaging and error reject also the lower bounds are 
listed, because they are very robust against outliers. 
The risk averaging method is extremely insensitive to 
labelling errors in the design set of a parametric 
classifier. As can be seen the bias is between 1 and 2 
percent (true bayes error 2596). On the contrary, if the 
errors are just outliers lying far away from the true 
distribution, the risk-averaging tends to be strongly 
optimistically biased. This is because the scatter will 
spread out, but the unlabelled test set does not obey 
that distribution. Both the upper and lower bounds 
will be similarly biased. 
In the nonparametric case, opposite to that of the para- 
metric case, risk averaging is robust against outliers, 
but pessimistically biased in case of labelling errors. 
The same applies for the method utilizing the error- 
reject tradeoff. This is because of the upper bound. 
The lower bound is extremely robust in the nonpara- 
metric case. This phenomenon can be taken into ad- 
vantage. If the difference between the upper and lower 
is big compared to the difference between the tradi- 
tional error counting and the lower bound, the design 
set contains labelling errors, and the lower bound can 
be used for prediction. 
  
  
  
Case Ag, Agg, Lm 
Parametric 
p=10 0.05 0.01 
p=29 0.15 0.02 
p=50 0.22 0.01 
p-5 A-30 0.11 -0.11 
p=25 A-30 0.20 -0.13 
p=50 A=30 0.21 -0.18 
Nonparam. 
p=10 0.04 0.03 0.06 
-0.04 0.01 
p=29 0.12 0.10 0.15 
-0.01 0.08 
p=50 0.16 0.12 0.18 
0.00 0.10 
p=5 A=30 0.01 0.01 0.02 
-0.03 -0.02 
p=25 A=30 0.04 0.03 0.03 
0.00 -0.01 
p=50 A=30 0.09 0.04 0.04 
0.00 0.00 
  
  
  
  
  
  
Table 6. Robustness of different error estimators, p=% of 
outliers, A=size of outliers, labelling errors if none, EC=error 
counting, RA=risk averaging, Er=Error reject.
1
2
...
342
343
344
345
346
...
1000
1001
Full text: XVIIth ISPRS Congress (Part B3)

Access restriction

Copyright

Note to user