procedures for information processing purposes. Open
problems in geoinformation processing abound and further
research and development work is required for wide ranging
applications. Such experience with information processing
should be useful in contemplating the problems of knowledge
acquisition, representation and processing.
3. OVERVIEW OF INFORMATION THEORY
From a theoretical perspective, the origins of information theory
go back to the foundations of probability theory as dealing with
uncertain or incomplete information is at the very basis of
probabilistic considerations. Measuring or quantifying
information contents is fundamental in formulating optimal
solutions for estimation and inference problems. Depending
upon the specific requirements, some information measures and
related discrimination functions may be more appropriate than
others.
Information measures are often expressed in terms of
frequencies of occurrence of errors or grey levels as these
provide a general approach to information contents without
necessarily any interpretation or evaluation of the implications.
Various information measures have been suggested and used in
different application contexts. For digital image processing and
related applications, the Shannon-Wiener entropy H[p] in terms
of discrete frequencies or probabilities p — [p1, po, ... , py] is
perhaps the best known and most appropriate for the intended
applications. Explicitly, the Shannon-Wiener entropy H[p] is
defined by
n
H[p] = H[pi. p2.... P517 — Y PrlogPr
k=1
and the corresponding relative entropy in case of a general
background or reference probability distribution q = [q1, q2,
+. Gal,
n
H[pq]-7 H[pi.p2. ...pn:d1.02. -.,q5n]7 - Y. pxlog (px qx)
k=1
where the summation signs are replaced by integral signs in
applications with continuous probabilities. The logarithms used
in these definitions are assumed to have the appropriate base
(usually 2) or else a multiplicative constant should be included.
When the background or reference probability distribution is
uniform, then the relative entropy reduces to the absolute
entropy.
For practical applications, information measures need to be
coordinate system independent and least sensitive to additive
noise in the data. The Shannon-Wiener relative entropy has
been shown to satisfy these conditions in practice [Blais and
Boulianne, 1988]. Furthermore, the relative entropy measure is
known to be unaffected by any orthogonal transformation (e.g.,
a rotation) of digital image data where the normalized grey level
frequencies are interpreted as probability distribution
frequencies [Andrews, 1970]. The latter is especially important
in the context of digital image processing using Fourier and
other orthogonal transforms which preserve the energy
associated with the grey levels.
For a continuous random variable with a Gaussian probability
distribution, the Shannon-Wiener entropy is proportional to the
logarithm of the variance in one dimension, and the logarithm of
the covariance matrix in higher dimensions [e.g., Blais, 19912].
This is not a surprising result as a Gaussian probability
distribution is fully specified by its first two moments and hence
the Shannon-Wiener entropy can be expected to be expressible
in terms of the second moment. Obviously, the situation is
different with other probability distribution functions which can
only be specified fully by their higher statistical moments.
It is important to realize that no interpretation nor any semantics
are included in the preceding definitions and discussions.
Mathematically, the analysis of a probability distribution does
186
not require any interpretation of the inferences as these can be
very different in different application contexts. On the other
hand, the appropriateness and implications of using one
information measure in a specific context may very well include
semantics and valuations for reasoning-like processing as in
expert systems.
The preceding concepts from information theory are very useful
in estimation and inverse problems where the available
observational and other information is often incomplete for the
desired solution. Considering the available information for
maximum exploitation without making any unnecessary
assumptions about what is not known is precisely the maximum
information or maximum entropy approach. Explicitly, the
maximum entropy principle states:
When making inferences based on incomplete information, the
estimates should be based on probability distributions
corresponding to the maximum entropy permitted by the
available information.
This principle was proposed independently by Kullback and
Liebler [1951], Jaynes [1957] and Ingarden [1963]. It has been
justified in terms of combinatorial arguments, axiomatic
inference, objectivity, consistency and reliability of the
estimation process [Jaynes, 1982 and 1983].
Applications of this maximum information principle are wide
ranging in physical science and engineering. Some applications
in model identification, digital image processing and spatial
information systems are discussed in Blais [1991a and b]. The
following discussions will concentrate on applications in
spectrum estimation, adaptive filter design and inverse problems
to illustrate the applicability of information theory and the
principle of maximum entropy.
4. APPLICATIONS IN SPECTRUM ESTIMATION
Estimates of power spectral density functions are required for
numerous applications in digital signal and image processing.
Filter design often relies on the analysis of the spectral analyses
of data sequences and arrays. The estimation of the spectrum of
one-dimensional data sequences is relatively straightforward
and the analysis of the estimates does not usually present any
problems. The situation is however quite different in two and
higher dimensions where the implications of difficulties in
factorization and positive definiteness of autocovariance
functions can imply serious difficulties.
Given a sample autocovariance sequence of finite length, the
spectrum estimation problem involves the extension of this
sequence for the Fourier transformation to estimate the spectrum
of the process. Well known approaches to the spectrum
estimation problem include the periodogram and correlogram
methods, the parametric modeling techniques of autoregressive
and moving average formulations, and the maximum entropy
approach which is based on information theory.
When using Fourier based methods, the extension of the
autocovariance function is implied by the periodicity of the
Fourier transform. This situation is usually quite appropriate in
noise dominated sequences although the spectral resolutions are
affected by the well known leakage and aliasing effects that are
unavoidable with Fourier transforms. With proper analysis of
the available information and constraints for the application
context, the periodogram and correlogram approaches to
spectrum estimation are generally acceptable, but not necessarily
optimal at least in terms of resolution.
With the parametric modeling approaches, the extension of the
autocovariance function is implied by the autoregressive,
moving average, autoregressive-moving-average or variation of
these models. Some constraints may also be required to ensure
that the extension of the autocovariance function is fully
compatible with the observations of the physical process. It is
important to note that the autoregressive modeling approach in
Ne (7) Heo e 0$ BR ML
T
r
oa Un
nm tet.
« "rm
E.
=" Pr CO dt ua Pe um