Full text: XVIIth ISPRS Congress (Part B3)

ESTIMATION OF CLASSIFIER’S PERFORMANCE WITH ERROR COUNTING 
Jan Heikkilä 
Institute of Photogrammetry and Remote Sensing 
Helsinki University of Technology 
002150 Espoo 15, Finland 
e-mail : jani@fifille.hut.fi 
Commission III 
ABSTRACT 
Because of the complexity of error estimation in statistical pattern recognition, error counting methodology has been 
extensively used for evaluating the performance of a classifier. The paper compares different error estimation procedures, 
both theoretically and experimentally by simulation. 
The classical error counting methods, like resubstitution, hold-out, leave-one-out and bootstrapping, are compared with 
the risk averaging methods and with methods using the relationships of error and reject tradeoffs. These second generation 
estimators have two clear advantages. First, they are designed to have a smaller variance. Secondly, and more importantly, 
the test samples can be unclassified. Thus, the risk-averaging methods are economical and in addition the mislabelling 
problems in the test set is avoided. Finally they are well suited in non-parametric methods and thus provide a possible 
solution for a general classification system. The simulation studies show that this is possible to achieve. 
1. INTRODUCTION 
The problem of estimating the performance of a class- 
ifier is a traditional one in statistical pattern recogni- 
tion. The performance is usually measured with some 
estimate of the error rate, the percentage of errors the 
system will produce. Even in the Bayesian case 
(known statistical models), the expected error rate is 
extremely hard to evaluate, and is known to have an 
analytical solution only in some special cases (see e.g. 
Devivjer & Kittler 1982, chapter 2). In practice the 
probability models must be inferred from available 
training data. If the true density functions are replaced 
by the corresponding estimates, an estimate for the 
Bayes error is achieved. This is usually called the 
apparent error rate. Even the interpretation of this 
measure is ambiguous. It is also known to be optimis- 
tically biased (Devivjer & Kittler 1982, chapter 10). 
The computation of the apparent error rate is extremely 
difficult involving numerical integration in complex 
regions, and often in high dimensional spaces. These 
difficulties have been pushing the practitioners and the 
theoreticians to look for more simple methods for esti- 
mating the error rate of a classifier. 
If some finite number of test samples with known 
labelling is available, the so called error counting pro- 
cedure is a natural choice for error estimation. That is, 
the test set is classified and the number of 
misclassifications is counted. The properties of this 
type of procedures have been extensively discussed in 
the literature and a multitude of methods have been 
developed. The limited number of labelled samples 
cause usually problems. Computation intensive 
methods have been developed to cleverly use the small 
number of samples so that both the design and the test 
sets can utilize all available pattern vectors (see e.g. 
Fukunaga 1990). 
There is another alternative for replacing the computa- 
325 
tion of the apparent error rate. These methods are 
either based on the idea of risk averaging or they use 
the relationship between the error and reject rates, the 
so called error-reject trade-off (Devivjer & Kittler 
1982). These estimators have two advantages over the 
traditional error counting procedures. First, their vari- 
ance is designed to be smaller (large sample sizes). 
Second, they permit to use unclassified samples as test 
samples. In remote sensing applications this means 
that one can use all the pixels in the satellite image as 
a test set. In addition to the great economical benefit, 
the problem of outliers in the test set is avoided. A 
large test set is particularly essential to accurately eva- 
luate a classifier with low error rate (Raudys & Jain 
1991). 
The difficulty, which arises in designing a pattern rec- 
ognition system comes from the variety of choices 
available. A large number of classification rules, both 
parametric and non-parametric, have been developed. 
A multitude of optimization criteria for finding a mini- 
mum set of features with maximal discrimination capa- 
bilities have been established. At least fifteen different 
methods are available for empirical error estimation. 
The total number of classification methods is thus in 
the order or hundreds. One goal of this research pro- 
ject is to test by simulation, if a general rule can be 
found, which can be applied to most cases, when no a 
prior information about the structure of the probability 
density functions is given. A general rule implies the 
use of a nonparametric classifier. The optimality of 
nonparametric classifiers must be found in practice by 
empirical error estimation (Fukunaga 1990, chapter 7). 
This error estimation criteria should have as small bias 
(hopefully none) and as small variance as possible, and 
should be robust against outliers. 
The paper is divided into five chapters. In chapter two 
we will discuss about the effect of finite sample sizes 
to the classifier design. This chapter is a review of the 
 
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.