Evidence combination
The purpose of blunder detection is to classify observations into those to be kept in the model
(‘good’ observations) and to those to be rejected (‘bad’ observations). This classification is usually
considered as a statistical decision process in which each residual is tested against the null hypothesis
that it does not deviate significantly from zero. It is then critical to select the risk level a. Too
large a risk level will result in the rejection of too many ‘good’ observations. On the other hand, too
small a value will degrade the power of the test and too many ‘bad’ observations will be accepted.
The selection is to be based on heuristics and experience. A typical choise is a = .001 or a = .01.
The problem can also be approached from a Baysian-classification point of view. An observation is
classified as ‘good’ or ‘bad’ based on probability criteria. Adapting some of the terminology from
(Buchanan and Shortliffe, 1985), Bays' theorem about conditional probability can be written as
P(d;)P(e|d.)
Pldile) = g————"—
3 i=1 Pld;) Pleld;)
where d; are the disjoint diagnosis or sub-classes, in this application d; is the class of ‘good’
observations and d; is the class of ‘bad’ observations; P(d;) is the a priori probability for each
class, (3° P(d;) = 1); P(e|d;) is the probability that an observation belonging to class d; has
symptoms and signs represented by evidence e; and P(di|e) is the conditional probability that an
observation belongs to sub-class d; if the evidence e is present. P(d;)’s and P(e|d;)'s are required
as input.
Let us investigate a specified observation k having an externally Studentized residual vector wy of
length |w;| = az. Vector wy is used as an indicator (evidence) for the presence of a: blunder: the
larger |w,|, the more likely is the presence of a blunder.
The Baysian decision rule can also be written as (Fukunaga, 1972)
P, (di) P. (e|d;) < Pk (dz) Pk(e|da) — obs., € fe
P, (da) can be interpreted as an a-priori-known probability about the occurence of a blunder in the
observation k, P,(d1) = 1 — Pa (d2). Pi(e|di) is given as o in conventional methods for blunder
detection:
co
P,(e|d1) = Pr(|wk| > ax) - f Pr(x|d,)dz
Gk
where p,(x|d;) is the density function of |wi| assuming that the observation belongs to dy, to
the class of ‘good’ observations. Because of the external Studentization, pk(z|d;) follows a T-
distribution. The T-distribution is here defined such that if z follows a F-distribution then /z
follows a T-distribution. The T-distribution is multi-dimensional, opposite to the t-distribution
which is one-dimensional. Also,
P,(e|d2) = Pr(|wx| < ak) = I" Pk(z|d2)dz
where pi (z|d2) is the density function of |wk| assuming that the observation belongs to d5, the
class of ‘bad’ observations. Unfortunately this distribution is generally not known. Because an
empirically validated distribution is lacking, some hypothetical distribution must be used. Adapting
the mean-shift model, the distribution would be a non-central T-distribution with an unknown
non-centrality parameter. If the frequency of different magnitudes of the mean shifts among the
blunder-observations can be estimated, even subjectively, a linear combination of the corresponding
non-central T-distributions can be used as p(x|dz). The general appearances of p(x|d;) and p(z|d;)
are show in Figure 1. For any realistic choice of p(z|dz) it must hold that
] * pteidalds > / laid ja
200
MEE M EE ee OÓAOOOO€-hOi |! A =