193
of label. Let P x (j^) be the prior probability of label X s ,
and p (y | x s ) be the probability function which tells us how
well any occurrence of X s fits / , and P F (f) be the
probability of the original image. Bayesian theorem gives us the
posterior distribution:
p ( x I P F\x(f \ x s)Px( x s)
WI/, ‘ Pjj)
(2)
Since P p (/) is a constant, thus:
Px\F {■X,\f) 00 Pf\x Cf\x s )PAx s ) (3)
Our goal is to find an optimal label which maximizes the
posterior probability, that is the maximum a posterior (MAP)
estimate:
y = arg max P X]F (x s | /) (4)
The key step is to define p(f) and P(F \ X) ■
Given p(f | x ) follows Gauss distribution, the distribution is
represented by its mean u and variance cr .
x s x s
Thus energy function is as follows:
U(X,F) = ^[lnJ(5)
xeS 2<J X ^
Based on Hammersley-Clifford theorem, the joint distribution
of these conditional prior probabilities modeled by MRFs is
equivalent to a joint prior probability characterized by a Gibbs
distribution (Li, 2001). This MRFs-Gibbs equivalence allows us
to model the complex global contextual relationship of an entire
image by using MRFs of local pixel neighborhoods, which
makes MRF computationally tractable and as a very popular
contextual model.
P(X) follows a Gibbs distribution:
P(X) = fxp(-U(X)) < 6 >
where, Z = yexp(—U(X)) is the normalizing constant.
x
U(X) = ^V(X) i s energy function. V C (X) is the clique
ceC
potential of clique c eC ■
In our case, C is the set of spatial second order cliques
(doubletons). Each clique corresponds to a pair of neighboring
pixels. The P(X) will represent the simple fact that
segmentations should be locally homogeneous. Therefore, these
potentials favor similar classes in neighboring pixels:
K( X ) = V ( X ,> X r) =
+ 1 if x s *x r
-1 otherwise
(7)
There are four different optimization algorithms to find the
global optimum including Simulated Annealing using
Metropolis dynamics (Metropolis), Gibbs sampler (Gibbs),
Iterated Conditional Modes (ICM), and Modified Metropolis
Dynamics (MMD).
2.2 SVM Classification Method
SVM is a classification technique developed by Vapnik and his
group at AT&T Bell Laboratories. It is based on Vapnik-
Chervonenkis (VC) dimension theory and Structural Risk
Minimization (SRM) rule. SVM can solve sparse sampling,
non-linear, high-dimensional data, and global optimum
problems. The performance of SVM has been proved as good as
or significantly better than that of other competing methods in
most cases (Burges, C. J. C., 1998. Zhang, X. G., 2000).The
SVM method separates the classes with a hyperplane surface to
maximize the margin among them (see figure 1), and m is the
distance between H1 and H 2 , and H is the optimum
separation plane which is defined as:
w-x + b = 0, (8)
where A is a point on the hyperplane, W is a n-dimensional
vector perpendicular to the hyperplane, and b is the distance of
the closest point on the hyperplane to the origin. It can be found
as:
w-X;+b<-1, fory.=-1 (9)
w-x i +b> 1, for y. =+l (10)
These two inequalities can be combined into:
y,[(u>-x ( )+6]-l>0 V/. (11)
The SVM attempts to find a hyperplane (8) with minimum
|| W || 2 that is subject to constraint (11).
Fig. 1. Optimum separation plane.
The processing of finding optimum hyperplane is equivalent to
solve quadratic programming problems:
mini || w|| 2 + c££
2 ,=i
S.t. yi [w^( Xi ) + b]> l-£. (12)
4i - o,/ = 1,2,...,/
where C is penalty parameter, which is used to control the edge
balance of the error ^. Again, using the technique of Lagrange
Multipliers, the optimization problem becomes:
1 / / /
min X Z Z a i a jyiyj K ( x i. yj) ~ Z a i
^ /=1 j=i i=i
i=i
0<«,. <C,I = 1,2,...,/ (13)
where K{x i ,y j ) = ^(x,) • <j)(y } J is kernel function. There are
three major kernel functions including Gaussian Radius Basis
Function (RBF), Polynomial, and Sigmoid function. The
optimum classification plane is solved through chunking, Osuna,
and SMO algorithms, and then we will only need to compute
K(x i ,y j ) 1° the training process. The final decision function is:
f(x) = sgn(Y J y i a i K (x,x i ) + b) O 4 )
sv
When multi-class SVM is concerned, three basic methods are
available to solve the classification: One-Against-All (OAA),
One-Against-One (OAO), and Directed-Acyclic-Graph (DAG).