>
RS an
D
where M is the set of misclassified samples. The
methods differ from each other from the way they use
the available sample set.
3.1.1 Resubstitution In the resubstitution method
the whole sample set is used for designing and again
for testing the classifier. This method is known to be
optimistically biased (e.g. Devivjer & Kittler, chapter
10), thus giving a lower bound for the Bayes error (or
for the asymptotic error). However the bias reduces
then the sample size increases. Thus, if enough
samples is in use, resubstitution method can be used.
3.1.2 Hold-out The holdout procedure is a special
case of cross-validation, where the available sample set
is divided into exactly two sets. The other set is used
for designing the classifier and the other set as a test
set. The method is known to give an upper bound for
the Bayes error. Its variance is a little bigger than that
of resubstitution. What makes it rather complicated
from the practical point of view, is the demand for
independent test and design sets. The division is far
from a non trivial problem. The neglectance of some
part of the available data from designing the classifier
is not very pleasing, too. An arithmetic average of the
resubstitution and holdout methods should be used as
the final estimate of the error rate.
3.1.3 Leave-one-out The other extreme of cross-
validation is given then each of the samples in turn is
used for testing the classifier and the N-1 remaining
samples are used as a design set. The available
samples are thus more effectively utilized. Also the
design and test sets are statistically independent. In
case of independent sample vectors, the method is
known to be practically unbiased (Efron 1983). For
long time the method has been recommended to be
used in the context of small sample sizes. Unfortu-
nately, especially for small sample sets, the variance of
the leave-one-out method is high. Because the vari-
ance component dominates in small sample sets (Efron
1983), a low variance estimate is recommendable in
such cases. Another disadvantage of leave-one-out
method is the high computational cost for some types
of classification algorithms, because Nj different
classifiers must be designed. Fortunately for many
cases, recursive methods can be used to get the leave-
one-out estimate practically with the same computation
time as the resubstitution estimate (e.g. Fukunaga,
1990, chapters 5 and 7).
3.1.4 Bootsrapping A bootstrap design sample of
exactly size Ny is generated by sampling with replace-
ment. Two different estimates can be computed. In
the bootstrap resubstitution estimate, the samples of
the design set are used also for testing. The bootstrap
hold-out estimate is achieved by using those samples
330
for testing, which do not belong to the generated
design set. The procedure is repeated 100-200 times
(100 in the simulations of chapter 4) and the final
estimate is the arithmetic mean of the individual esti-
mates. Many modifications have been developed from
this basic algorithm. The one, which in various empi-
rical studies (e.g. Jain et al. 1987) has shown good
performance, is called the 0.632 bootstrap estimate. In
this heuristic method a weighted average of the two
bootstrap estimates is computed given weight 0.632 to
the holdout estimate. Theoretically (Fukunaga 1990,
chapter 5) the bias of the original bootstrap method
should coincide with the bias of the leave-one-out
method. However, the variance should be smaller, and
close to the resubstitution estimate.
Some hints of the weakness of the bootstrap method
have been reported. In (Chernick et al. 1985) the
0.632 bootstrap estimate showed weaker results when
used with linear classifiers in high error rate situations.
In (Jain et al. 1987) it was reported that the 0.632
estimator should not be used together with the NN
classifier.
3.2 Methods based on risk averaging
The risk averaging methods are designed especially to
have a low variance. A big advantage comes from the
fact that an unlabelled test set can be used for testing
the classifier. That is why the test set can be large
(e.g. the whole satellite scene) and thus the method
should be suitable for low-error rate cases. Because
the variance of the error comes primarily from the test
set, this already produces a low variance. In risk aver-
aging the risk for each decision is estimated and the
final error estimate is the average of all of the esti-
mates. Thus
Ny
3.6 (22)
where £, is the estimated risk for decision i, which
equals to 1-max[P(oIx;)]. Estimate (22) is sometimes
called the grouped estimate. The performance of this
estimate has not been fully studied in comparison with
the error counting. Note that the method is especially
suitable for nonparametric classifiers, where the risks
are always computed. No modifications for the algo-
rithms are needed.
It is shown in (Devivjer & Kittler 1982, chapter 10)
that the estimate is unbiased and the variance of the
method is (in case of equal prior probabilities)
«S010. LS
; 23
N, 2N, on
Var( é, }
which is less than half of the variance of error counting
(com
3.2
estin
(22)
phas
taker
to be
give:
3.2
gettii
aver:
now
comp
set, I
ence
this
varia
33
If a
tem,
rejec
class
the ri
1982
where
old (i
Thus
estim
These
of th
unlab
3.3.
used :
speci:
show:
asym)
forme
where
with
asym]
classi:
them.
Unfor