XVIIth ISPRS Congress: XVIIth ISPRS Congress

fritz, lawrence w.; lucas, james r.
  
  
  
  
  
procedures for information processing purposes. Open 
problems in geoinformation processing abound and further 
research and development work is required for wide ranging 
applications. Such experience with information processing 
should be useful in contemplating the problems of knowledge 
acquisition, representation and processing. 
3. OVERVIEW OF INFORMATION THEORY 
From a theoretical perspective, the origins of information theory 
go back to the foundations of probability theory as dealing with 
uncertain or incomplete information is at the very basis of 
probabilistic considerations. Measuring or quantifying 
information contents is fundamental in formulating optimal 
solutions for estimation and inference problems. Depending 
upon the specific requirements, some information measures and 
related discrimination functions may be more appropriate than 
others. 
Information measures are often expressed in terms of 
frequencies of occurrence of errors or grey levels as these 
provide a general approach to information contents without 
necessarily any interpretation or evaluation of the implications. 
Various information measures have been suggested and used in 
different application contexts. For digital image processing and 
related applications, the Shannon-Wiener entropy H[p] in terms 
of discrete frequencies or probabilities p — [p1, po, ... , py] is 
perhaps the best known and most appropriate for the intended 
applications. Explicitly, the Shannon-Wiener entropy H[p] is 
defined by 
n 
H[p] = H[pi. p2.... P517 — Y PrlogPr 
k=1 
and the corresponding relative entropy in case of a general 
background or reference probability distribution q = [q1, q2, 
+. Gal, 
n 
H[pq]-7 H[pi.p2. ...pn:d1.02. -.,q5n]7 - Y. pxlog (px qx) 
k=1 
where the summation signs are replaced by integral signs in 
applications with continuous probabilities. The logarithms used 
in these definitions are assumed to have the appropriate base 
(usually 2) or else a multiplicative constant should be included. 
When the background or reference probability distribution is 
uniform, then the relative entropy reduces to the absolute 
entropy. 
For practical applications, information measures need to be 
coordinate system independent and least sensitive to additive 
noise in the data. The Shannon-Wiener relative entropy has 
been shown to satisfy these conditions in practice [Blais and 
Boulianne, 1988]. Furthermore, the relative entropy measure is 
known to be unaffected by any orthogonal transformation (e.g., 
a rotation) of digital image data where the normalized grey level 
frequencies are interpreted as probability distribution 
frequencies [Andrews, 1970]. The latter is especially important 
in the context of digital image processing using Fourier and 
other orthogonal transforms which preserve the energy 
associated with the grey levels. 
For a continuous random variable with a Gaussian probability 
distribution, the Shannon-Wiener entropy is proportional to the 
logarithm of the variance in one dimension, and the logarithm of 
the covariance matrix in higher dimensions [e.g., Blais, 19912]. 
This is not a surprising result as a Gaussian probability 
distribution is fully specified by its first two moments and hence 
the Shannon-Wiener entropy can be expected to be expressible 
in terms of the second moment. Obviously, the situation is 
different with other probability distribution functions which can 
only be specified fully by their higher statistical moments. 
It is important to realize that no interpretation nor any semantics 
are included in the preceding definitions and discussions. 
Mathematically, the analysis of a probability distribution does 
186 
not require any interpretation of the inferences as these can be 
very different in different application contexts. On the other 
hand, the appropriateness and implications of using one 
information measure in a specific context may very well include 
semantics and valuations for reasoning-like processing as in 
expert systems. 
The preceding concepts from information theory are very useful 
in estimation and inverse problems where the available 
observational and other information is often incomplete for the 
desired solution. Considering the available information for 
maximum exploitation without making any unnecessary 
assumptions about what is not known is precisely the maximum 
information or maximum entropy approach. Explicitly, the 
maximum entropy principle states: 
When making inferences based on incomplete information, the 
estimates should be based on probability distributions 
corresponding to the maximum entropy permitted by the 
available information. 
This principle was proposed independently by Kullback and 
Liebler [1951], Jaynes [1957] and Ingarden [1963]. It has been 
justified in terms of combinatorial arguments, axiomatic 
inference, objectivity, consistency and reliability of the 
estimation process [Jaynes, 1982 and 1983]. 
Applications of this maximum information principle are wide 
ranging in physical science and engineering. Some applications 
in model identification, digital image processing and spatial 
information systems are discussed in Blais [1991a and b]. The 
following discussions will concentrate on applications in 
spectrum estimation, adaptive filter design and inverse problems 
to illustrate the applicability of information theory and the 
principle of maximum entropy. 
4. APPLICATIONS IN SPECTRUM ESTIMATION 
Estimates of power spectral density functions are required for 
numerous applications in digital signal and image processing. 
Filter design often relies on the analysis of the spectral analyses 
of data sequences and arrays. The estimation of the spectrum of 
one-dimensional data sequences is relatively straightforward 
and the analysis of the estimates does not usually present any 
problems. The situation is however quite different in two and 
higher dimensions where the implications of difficulties in 
factorization and positive definiteness of autocovariance 
functions can imply serious difficulties. 
Given a sample autocovariance sequence of finite length, the 
spectrum estimation problem involves the extension of this 
sequence for the Fourier transformation to estimate the spectrum 
of the process. Well known approaches to the spectrum 
estimation problem include the periodogram and correlogram 
methods, the parametric modeling techniques of autoregressive 
and moving average formulations, and the maximum entropy 
approach which is based on information theory. 
When using Fourier based methods, the extension of the 
autocovariance function is implied by the periodicity of the 
Fourier transform. This situation is usually quite appropriate in 
noise dominated sequences although the spectral resolutions are 
affected by the well known leakage and aliasing effects that are 
unavoidable with Fourier transforms. With proper analysis of 
the available information and constraints for the application 
context, the periodogram and correlogram approaches to 
spectrum estimation are generally acceptable, but not necessarily 
optimal at least in terms of resolution. 
With the parametric modeling approaches, the extension of the 
autocovariance function is implied by the autoregressive, 
moving average, autoregressive-moving-average or variation of 
these models. Some constraints may also be required to ensure 
that the extension of the autocovariance function is fully 
compatible with the observations of the physical process. It is 
important to note that the autoregressive modeling approach in 
Ne (7) Heo e 0$ BR ML 
T 
r 
oa Un 
nm tet. 
« "rm 
E. 
=" Pr CO dt ua Pe um
1
2
...
195
196
197
198
199
...
1000
1001
Full text: XVIIth ISPRS Congress (Part B3)

Access restriction

Copyright

Note to user