XVIIth ISPRS Congress: XVIIth ISPRS Congress

fritz, lawrence w.; lucas, james r.
and 
The 
| the 
ifier 
We 
| Our 
uced 
: the 
Also 
tions 
ding 
den- 
and 
tions 
inear 
r of 
erent 
fiers 
dis- 
Xtoti- 
an 
| be 
5). 
tion 
t of 
pro- 
d to 
bias 
is is 
de- 
naga 
(2) 
(3) 
°t to 
EQ(Ve) - 20 , (4) 
  
  
  
D 
where 
pw 
8 T 
Vo = : ei M jq 
442x u"u 2 (5) 
T CT 
Lue) we 
16 2 
  
In (2)-(5) Np is the size of the design set, d 
dimensionality of the feature space and p is difference 
between the mean vectors (p,-p,). If d>>1, from (2)- 
(5) it can be seen that the bias of a linear classifier is 
proportional to d/N, and the bias of a quadratic 
classifier is proportional to d?/N;. In the case of linear 
classifier this means that a number of datavectors, 
which is a fixed multiple of the dimensionality of the 
feature space (Np=c*d), is enough to compensate the 
effect of finite sample size. As can be expected this is 
not valid for the quadratic case. The size of the con- 
stant c is dependent on the separability between the 
classes (~the expected error rate). The higher is the 
expected error, the greater value should the constant 
have. 
2.2.2 Nonparametric Parzen classifiers The 
nonparametric classifiers are appealing, because they 
do not do any prior assumptions about the density 
structures of the underlying problem. However, they 
are more computation intensive and are shown to be 
heavily biased in high dimensional spaces (see below). 
They also have some parameters to tune for achieving 
optimality. The tuning of these parameters is extreme- 
ly important and can be done by experimental error 
estimation. These topics are discussed in this chapter. 
The detailed derivations can be found from (Fukunaga 
1990, chapters 6 and 7). 
A nonparametric density function can be estimated by 
the so called Parzen method. The corresponding clas- 
sifier is called a Parzen classifier. The Parzen estimate 
of a density function can be expressed as 
Np 
2i k-K(Q,x,.x) (6) 
£i 
p(x) me : 
where K is a kernel function, k determines the size of 
the kernel, Q is a metric used to compute distances and 
X;:S are the sample vectors. The corresponding class- 
ifier compares the a posteriori probabilities to a deci- 
sion threshold, t. An analysis of the effect of all these 
variables is needed. 
A second order approximation of the expected value 
and variance of the Parzen density estimates can be 
shown to be (Fukunaga 1990, p.258-259) 
327 
E(p(x)) = p(x) *p(x)a (x)? (7) 
2 
and 
wird 2 (8) 
Var(p(x)} = —— (p(x), (x),Q.p*(),K’) , 
D 
where 
sr {Veg} | © 
p(x) 
and the precise form of (8) can be found from (Fuku- 
naga 1990, p. 259). (7)-(9) show that only the vari- 
ance is dependent on the sample size being propor- 
tional to 1/N,. Thus it can be reduced by increasing 
the sample size. On the contrary the bias is not at all 
dependent on the sample size and must be minimized 
by a proper selection of the parameters k and Q. From 
(9) we can see that both the bias and the variance are 
dependent on the curvature of the density structures. 
Of course the bias and variance of the density esti- 
mates affect to the bias of the classification error. 
After some hard mathematics this can be expressed as 
d 
ir 10 
E{Ve} = a,k* + a,k* « ak "^ - bv, an 
where a,, a,, a; and b are complicated functions depen- 
ding on o,(x)-0,(x), Q and p(x), and At is the bias of 
the decision threshold. 
Although the constants a,, a, and a, are complicated 
functions, they are only dependent on the probability 
structures and the metric, and totally independent on 
the kernel size and on the sample size. When k is 
small, the term a, is dominating. This term reflects the 
variance term, Var{p(x)}, of the density estimate. 
When k grows, terms a, and a, start to dominate. 
These terms reflect the effect of the bias term of the 
density estimate, E{p(x)}, to the classification error. 
Both terms should be adjusted properly. 
When k is small the variance term of the density esti- 
mate dominates. Thus the bias can be reduced by 
increasing the sample size. However, on bigger values 
of k, the bias term of the density estimate dominates. 
This effect cannot be anymore reduced by increasing 
the sample size. The difficulty is overemphasized in 
higher dimensional spaces. When the intrinsic dimen- 
sionality of the feature space is high it becomes hope- 
less to increase the sample size (because the amount of 
samples needed grows so fast, Fukunaga 1990, p. 264), 
and the only choice to get a reasonable estimate is to 
increase k. However, we can see from (10) that a
1
2
...
336
337
338
339
340
...
1000
1001
Full text: XVIIth ISPRS Congress (Part B3)

Access restriction

Copyright

Note to user