a factor graph with three variables z1, 2,23 and two function
nodes fi and f» with factorization: g(z1, x2, x3) = fi(z1, x2) *
f2(x2, x3).
Figure 1: An illustrative example of simple factor graph with
three variables x1, x2, x3 and two function nodes fi (x1, x2) and
fa(x2, x3).
The task of classification consists of determining the probability
of a particular hypothesis given some observed evidence. This is
solved by calculation of marginal probability of a latent variable,
or by calculating of the maximum likelihood probability (max-
imum likelihood on the configured factor graph given the evi-
dence). The likelihood of the evidence (the features vector) can
be written as follows:
KÖN
p(æ,c|sæ) = | ] ] | P(æn|cr)Pler|s), (1)
k=1n=1
where x is the input evidence; s, is the class; cy is the features
mixture (a latent variable); K is the number of classes; NN is the
number of features in the input evidence vector. Here, the feature
is assumed to be a spectral band value.
The following factor graph structure can be defined for formula
(1):
Figure 2: Factor graph model (independent model; a mixture la-
tent variable is employed) for hyperspectral data classification
The factor graph is described as:
N N
g(11,13,... TN, Ck, 8.) — Il £n tn) Il fara ck) Q)
n=l
fs Ck, 8512: (31),
n=1
where x, is the n-th input feature; ci is the features mixture (a
latent variable) for the k-th class; s; is the k-th class number;
Z1,-- - , ZN, Zs are normalizing factors in the graph model [Frey
and Jojic, 2005] (define prior probabilities); f1,..., fw are the
factors mapping the input features to the feature mixture; fc—s iS
the factor bridging the latent variable to the semantic class sg.
The structure of a factor graph defines a dependency of class vari-
able node on input features. Use of training data allows to calcu-
late a configuration (parameter set 0(k)) for a factor graph for
each class k.
A configured factor graph with the configuration 0(k) is expected
to have a maximum likelihood probability (a low energy state) on
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B7, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
138
the evidence which most likely (similar) to the employed training
data. Expectation maximization method with gradient ascent op-
timizer are employed for learning the graph configuration. Mean
field inference method [Frey and Jojic, 2005] is employed for the
inference. Comparison of the classes probability maps (maxi-
mum principle) allows to produce label map.
2.4 Employed data
Salinas hyperspectral data benchmark (AVIRIS sensor over Sali-
nas Valley, California; 3.7 m pixel size) was selected for classi-
fication accuracy evaluation. The data cube size is 512 lines by
217 samples, 224 bands. 19 water absorption bands were dis-
carded (bands [108-112] and [154-167]) This image is available
as at-sensor radiance data. Ground truth classes and the number
of samples are given in Table 1.
Minimum noise fraction (MNF) [Green et al., 1988, Boardman
and Kruse, 194] was employed to reduce the number of input
features, reduce computational time, and separate noise from the
data. The MNF consists of two Principal Components (PC) trans-
formations. The first PC transformation decorrelates and rescales
noise in the data, the second PC transformation performed on the
noise-cleared data.
Since factor graphs are discrete graphical models, the feature is to
be represented on a predefined finite domain (alphabet). The fi-
nite domain refers to the unique values (or a list of values) the fea-
ture can have. Here we use a finite domain consisting of natural
numbers. To represent an input feature on the finite domain, the
feature is proposed to be processed separately by an unsupervised
clustering. A cluster's number is assumed as the value from the
defined domain. k-means procedure is employed and the number
of clusters for feature representation on finite set was equal to 100
(used for representation of all features). In an additional experi-
ment, the features before representation are processed by median
filtering.
20 points were randomly selected for each class in order to con-
figure the factor graph. Expectation maximization with a gradient
ascent method were employed for the factor graph configuration.
Table 1: Salinas hyperspectral benchmark classes (available
ground truth samples)
| Number | Class
| Samples |
3 RESULTS AND DISCUSSION
Table 2 presents the overall accuracy and Kappa coefficient for
the classification. The additional experiment with feature median
filtering allowed to obtain a better classification accuracy results
(overall accuracy=85.3217 and Kappa=0.8358 versus 81.3692 and
0.7921, respectively; compare confusion Tables 3 and 4). Most
of the classes were labeled with an accuracy more than 90%,