squares estimation which has widely been used.
MDL
Minimum Description Length (MDL) principle
studies estimation based upon the principle of
minimizing the total number of binary digits
required to rewrite the observed data, when each
observation is given with some precision. Instead of
attempting at an absolutely shortest description, it
looks for the optimum relative to a class of
parametrically given distributions.
The MDL principle can be generally expressed as
L(<,®) = L(x/®) + L(@) (1)
where
L() is a measure of the uncertainty of an
event and its unit is "bit".
L(x,O) is the total number of bits to describe
the observed data when we introduce the
model. "x" expresses the observed data, and
"O" represents the model parameters.
L(x/©) is the number of bits to describe the
data if assuming the model is known.
L(®) is the number of bits to describe the
model.
L(x,O) is the least information content required to
remove the uncertainty in the observation and
describe the model. Thus, the number ot bits 1n a
description required for the interpretation of the
observation becomes a measure of simplicity.
The main power of the MDL principle is that it permits
estimates of the entire model, its parameters, their
number, and even the way the parameters appear in the
model; i.e., the model structure [Ressanen,78,83,84].
Bayesian Estimation has been successfully used in
scene reconstruction [Zheng], while MDL principle
has been applied in the image segmentation and
feature extraction [Fua,89a,89b,91] [Leclerc,89,90]
[Keeler]. The reason on the different applications is
that it is reasonable to apply probability theory to
analysis the raster image when we assume the
image is a stochastic process, but is has not been
proven that BE can be applied to analysis structural
phenomena, which is sometime considered in the
integration of statistic and structural pattern
recognition. The flexibility of MDL is that we can
treat (1) simply as a new criterion, while forgetting
its background on statistics and information theory.
If we can not derive a precise descriptive language
or encoding scheme to describe an event by a
number of bits based on Shannon's first theorem, an
approximated scheme can still be used. The resulted
estimation is the relative optimal result constrained
by the approximated encoding scheme.
604
Examples of encoding using MDL
. encoding the image intensity
The interior intensities of an image region can
be modelled by a smooth intensity with a
Gaussian distribution of deviations from the
surface. The formula for this problem has
been solved by Hua and Hanson [Fua,91]:
L, = n,(logo + c) + 8n, +
n fo
[n;log( —) * n,log(—)] * N, (2)
ni m
Where
c= Liog2ne
2
L, : number of bits to describe intensity
information in a region
ny, : total number of pixels in one region
n, : number of pixels in the Gaussian
peak
n, : number of outliers
N, : number of bits to describe surface
model
o : standard deviation of intensity noise
In (2), first item is the cost of Huffmann-
encoding the pixels in a Gaussian peak,
second item is the cost of encoding the pixel
outliers, third one is the entropy for encoding
the pixel on whether it is or is not a
anomalous and fourth term specifies the
coding of the model.
. encoding shape of region
In this paper, we give an example modelling
on shape on which the ideal region boundary
composes a number of straight line segments.
For a straight line, its digitalization fulfils the
chord property which states [Hung,85] that "a
digital arc A is said to have the chord
property if for every two digital points, the
chess-board distance of any point of arc A to
the straight line nowhere exceeds 1". We
consider the points on a curve which do not
meet the chord property as outliers, and such
outliers are constrained by their neighbours.
The cost required to describe the image region
shape is formulated as: