XVIIth ISPRS Congress: XVIIth ISPRS Congress

fritz, lawrence w.; lucas, james r.
)) that 
it will 
> only 
rnel is 
put to 
ke the 
t puts 
It is 
' prob- 
> Error 
)) 
(14) 
quares 
iate, &, 
ble to 
e, this 
'edure. 
ecause 
uld be 
'es the 
known 
design 
tor is 
of the 
like in 
puting 
of the 
nptotic 
>vivjer 
(15) 
] on k 
error. 
younds 
a kNN 
> sized 
by the 
nption 
onding 
Dias in 
cunaga 
| order 
cedure 
(16) 
  
i d+2 2 
d . 
f | 2 |t NS x gn 
dn D 
p, 
and B, is a function depending on the probability struc- 
tures and the metric used. Correspondingly, the second 
order approximation of the 2NN bias is 
  
  
  
d 2| (18) 
E{A@ nn} = B, sila ? tr(QB,(x)) | ; 
where 
2 2 
r4 q:2 T d * tel 4 
2 d dj 4/7 (19) 
B, = 2 * Np 
an {1 5) 
and B, is again dependent on the probability structures 
of the underlying problem. 
The effect of the sample size to the bias is dependent 
only on the constants B, and B,. The bias seems to be 
very difficult to compensate by increasing the sample 
size. Luckily the bias of the 2NN case is much 
smaller than in the NN case. This can be seen from 
figure 2. This agrees with the fact that the NN method 
seems to give reasonable result in practice only in the 
lower dimensional cases. 
  
p e - * q.NN-classifier 
0.060 T 
0.050 Slide. se boasts Zi ; 
0.040 S me d teu euro Co NP . 
0.030 fes eni VER TTC 2 
0.020 WEL Mem. 41 
0.010 t e. wi e. IE 
EZ qe IN 
  
T 
100 1000 10000 100000 1000000 
p 4-——4 b.NN2-classifier 
  
0.004 
0nd. rate A 
0.002 Men te TT SE es : 
0 .00 1 es le pis Er Tm Wt à 
Tp pun pp 
100 1000 10000 100000 1000000 
  
  
  
Figure 2. The bias terms of the NN and 2NN classifiers. 
The feature space dimension varies from 2 to 128 from 
bottom to top, respectively. 
For larger values of k, the kNN procedure reminds 
closely the Parzen procedure. Correspondingly it gives 
similar results in all aspects of the error estimation 
problem (Fukunaga & Hummels, 1987b). For large 
values of k, the bias can be expressed by 
Equation (20) has a remarkable similarity with equation 
(10). The a, term is missing, because (20) is only 
valid for large values of k and the a, term vanishes 
329 
b 2 4 
E(AE) = = + hr * a F3; -9:A. 
(20) 
when k is increasing. The constant term b,/k is the 
only clear exception from (10). It shows that the kNN 
procedure converges to the Bayes error only for large 
values of k, even in the asymptotic case. A serious 
disadvantage of the voting kNN procedure is the fact 
that there is no possibility to adjust the decision thresh- 
old for reducing the bias. 
2.2.4 Robustness of the classifiers The unbiasness 
of the estimated parameters of parametric density func- 
tions is shown in every statistical textbook. The effect 
of outlying observations has been a primary concern 
during the last decade (see e.g. Huber 1981). For 
parametric classifiers the traditional estimation tech- 
niques have been shown to be extremely sensitive to 
outliers, the breakdown point being asymptotically zero 
(a single outlier can cause the system to break down). 
However, the concept of the breakdown point is a little 
misleading, because it analyzes only the worst case. 
The robustness is naturally a function of the size and 
of the amount of outliers. A theoretical analysis of 
these effects is hard to carry out. To find out the 
dependence of the estimated error rate against these 
parameters, a simulation study is a feasible choice. We 
will shortly return to this subject in chapter 4. 
In case of the nonparametric methods the outliers are 
not so serious, because only some local neighbourhood 
is used for density estimation. If none of the outliers 
does not belong to this neighbourhood, no damage will 
be created. The situation is worse in the case of 
labelling errors in the design set, because in that case 
remarkable violations will occur also in the non-para- 
metric case. We will simulate both these cases in 
chapter 4. 
Because we are more interested in non-parametric 
classifiers due to their generality, we will not concen- 
trate in this context to the parametric robust estimation 
techniques. 
3. EMPIRICAL ERROR ESTIMATION 
In this context we classify the empirical error estima- 
tion methods into three categories: error counting 
methods, methods based on risk averaging and methods 
based on the error reject tradeoff. In this chapter we 
will shortly review the methods, which we have used 
in the simulations of chapter 4. 
3.1 Error counting methods 
Error counting methods have been traditionally used in 
pattern recognition for estimating the error rate. Many 
variates exists, from which we have used four in this 
context. The error counting estimate is given by
1
2
...
338
339
340
341
342
...
1000
1001
Full text: XVIIth ISPRS Congress (Part B3)

Access restriction

Copyright

Note to user