gorithm [Hart, 1968]. As adaptative learning algorithms
we have chosen DSM [Geva & Sitte, 1991] -Decision Sur-
face Mapping- and LVQ-1 [Kohonen, 1990] -Learning Vec-
tor Quantization, version 1-. The values of the para-
meters involved in LVQ-1 learning have been estim-
ated by using two algorithms proposed by the au-
thors [Cortijo & Pérez de la Blanca, 1996b].
Now we can apply the 1-NN classifier using the reference set
learned by these algorithms. We will note by 1-NN (74)
to the 1-NN classifier that uses Tm as reference set, that
is, the multiedited training set. Following this notation,
if Tuc is the multiedited-condensed training set, then 1-
NN (7x c) is the 1-NN classifier that uses Tmc as reference
set. To apply DSM learning it is required that the training
set to be previously edited [Geva & Sitte, 1991]. We have
used Tu as initial set for DSM learning. Now if Tpsm
is the reference set after DSM learning, 1-NN (7psx) is
the 1-NN classifier that uses Tpsm as reference set. Fi-
nally, if Tzvg-1 is the reference set after LVQ-1 learning,
1-NN (7zvQ-1) is the 1-NN classifier that uses TLv@-1 as
reference set. More details about these algorithms can be
found in [Cortijo & Pérez de la Blanca, 1996a].
2.2 Contextual Classifiers
The contextual classifiers we have tested are based in the
assumption of a Markov random field to model the prior dis-
tribution of the labels in the image. Stochastic models and
random fields (RF) in particular represent accurately inform-
ation a priori on the map. This information can be used in
such a way that the Bayes decision theory can be applied.
A random field is a joint probability distribution imposed on
a set of M random variables L = {L1,..., Lu } represent-
ing objects of interests that imposes statistical dependence in
a spatially meaningful way. In contextual classification each
L; € Q. The spatial dependence can be specified by a global
model such as the Gibbs random field (GRF). A GRF describes
the global propertied of an image in terms of the joint distri-
bution of labels for all pixels [Dubes & Jain, 1989]. A Markov
random field (MRF) is defined in terms of local properties.
It is needed to fix a neighborhood system in which the spa-
tial dependence is relevant. Two neighborhood systems are
mainly used, the first order neighborhood which includes the
four-nearest-spatial-neighbors, and the second order neigh-
borhood which includes the eight-nearest-spatial-neighbors.
Given a set of observations, X = x, and the contextual in-
formation modeled as a MRF, P(L — l), in a Bayesian con-
text the objective is to find the estimator Í which maximizes
equation 1, that is, the a posteriori probability of L — i given
> =u
P(X=z|L=10)P(L=I
PL=1|X=g)= (1)
This is known as the MAP (maximum a posteriori) method.
The model relating observation x to labeling / is chosen to
ensure that the posterior distribution of L, given X — z, is
also a MRF. If we require conditional independence of the
observed random variables. given the true labels, it is enough
to ensure that the posterior distribution is also a MRF. Thus
we assume that
PX=c|L=l=][PXi=a|Li=t) (2
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B3. Vienna 1996
If both P(X = x | L = 1) and P(L = [) are known we
can compute L which maximizes the MAP by applying equa-
tion 1. In the practice it is clear that even if M and J are
low it is not possible to calculate directly the MAP as given
in equation 1. To circumvent this problem some alternat-
ives are available to estimate the MAP [Dubes & Jain, 1989].
The first approximation consists in the simulated anneal-
ing algorithm [Geman & Geman, 1984] which find MAP es-
timates for all pixels simultaneously. As the computational
demands of this algorithm are considerable there are two
computationally feasible approximations to the MAP estim-
ate: a) the ICM algorithm (iterated conditional modes)
and b) the MPM algorithm (maximizer of posterior margin-
als). A detailed discussion on these methods can be found
in [Dubes & Jain, 1989] and references therein. We will cen-
ter our interest in the ICM algorithm [Besag, 1986] which
has been demonstrate to have an excellent trade-off between
the accuracy of the contextual correction and the required
computational effort [Cortijo, 1995].
Another approximation to contextual correction using a MRF
consists in point-to-point contextual correction methods.
They are based in complex conditioned-probability models
which are extensions of the MAP expression given in equa-
tion 1 by adding an additional term, the contextual cor-
rection factor, into the denominator of the MAP expres-
sion [Sæbg et al., 1985]. Assuming conditional independence
of the feature vectors (observations) in a spatial neighbor-
hood two models can be adopted [Saebg et al., 1985]: a) the
Welch and Salter, Haslett's model and b) the Owen and
Switzer's model. We have tested both models in this work.
Contextual classifiers accept as input the classifications ob-
tained by the 8 spectral classifiers described in section 2.1,
so we have performed 24 additional classifications for each
problem.
3 DATA
The data used to test the performance of the classifiers are two
LANDSAT images, landscapes from Greenland, Denmark’.
The first image is a LANDSAT-2 MSS image of the lga-
liko region. The second is a LANDSAT-5 TM image of
the Ymer @ region. Both images are 512 x 512 pixels in
size. The training sets have been selected by expert geolo-
gists [Conradsen et al., 1987] and their spectral distribution
represent different problematics.
In Igaliko we have five classes to discriminate, the training
set size is 42796 samples and there is a slight overlapping
in the the spectral distribution of the training samples. In
Ymer @ we have twenty classes to discriminate, the train-
ing set size is 12574 samples and there is a high over-
lapping in the spectral distribution of the training samples.
See [Conradsen et al., 1987] for more details.
In this work we have adopted the test sample estimation to
measure the accuracy of the classifications. The training set,
T is splited into two disjoint sets: 7" (learning set) and 7"
(test set). 7! has been built by selecting randomly 2/3 of the
available training samples; the remainder are placed into Tt
We use the learning set to construct the classifier and the test
set for testing. In tables 1 and 2 we show the learning and
test set sizes for each dataset.
!We must thank to the IMM (Denmak University of Technology, Lyn-
gby, Denmark) for providing the LANDSAT images used in this work.
122
Tal
Tal
In table 3
formed on
curacy of t
We show ii
used to ge!
tion, in the
accuracies
map by us
5... DIS
From tab
of the spe
drastically:
independe
true for th
We can c
gives the
computati
computati
the global
classificati
We must
the combi