International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B2. Istanbul 2004
(a) Initiate BP Nets. The weight matrixes (HW. W,;) and the
threshold values ( ) are initialized as randomized real
numbers within the range -1.0 to +1.0 where i represents
i" node of input layer. j represents j node of hidden layer,
and k indicates K^" node of output. W; is the weight matrix
between input layer and hidden layer, Wi; is the weight
matrix between hidden layer and output layer.
(b) Input the values of the training pixels (samples) and the
target value for correct output.
(c) Calculate the value at each network neurons using
equation 9:
net, — Y 7.0 (9)
where net, is the input value of /" hidden ncuron, and O; is
the input value of i^ input neuron. W; indicates the weight
between the i^ neuron of the input layer and the j^ neuron of
the hidden layer. The output value of the hidden neuron is
evaluated as:
0; = flnet;) (10)
Jg
where O; is the output value of the j" hidden neuron and f. IS
the Sigmoid activation. function. The following function
specifies f:
je RET T 11)
f(x) on) (
I + exp —
where 4/ is a threshold vector and ff, is used to adjust the
shape of the Sigmoid activation function.
(d) Calculate the value of output neuron:
net, = fW,0,) (12)
where Met, is the input value of the K^ output neuron, and O,
: . ^ th . 2, :
is the input value of the /" hidden neuron. W,, indicates the
- a 4 + : h
weight between the /" neuron of hidden layer and the «^ neuron
of output layer. The corresponding output value is:
0 = f(net, ) (13)
where f is Sigmoid activation function as specified earlier.
(e) Calculate the output layer error and the hidden layer error.
d, - 0, -0, YO, zn) (14)
e m0, (7 -0, 13 W, d, (15)
:
In the equation 14, d, is the reference error of the A" neuron
in the output layer, /, is the target output, and in the equation 15
: = M ; 7
e; is the reference error of /" neuron in the hidden layer.
184
(f) Calculate the output layer error using the error function E.
IEE :
ECINS IO. zd us J (16)
Zi kl
where, O,, is the observed output value and /,, is the arget value.
P 1s the number of output neurons and N is the number of
samples.
(g) Adjust the connection weight matrix and thresholds.
The adjustment of weight matrix W, and threshold 7,; between
output layer and hidden layer follows according to:
Ww, (m si) = We (m) TGÓ d, -7AW. (m) (17)
y m * D) y, (m) ad, (18)
where m is the number of iterations, a is the learning rate, AW,
is a matrix representing the change in the matrix W, and, 7 is a
momentum factor; it is used to allow the previous weight
change to influence the weight change in the present model
iteration.
The adjustment of the weight matrix W,; and the threshold 8;
between the hidden layer and the input layer:
W (m +1) = W
Jif
(m) BO,e, - n AW, (m) a9
60, (m+1)= gd (m+ fe (20)
where P is the learning rate, AW, 1s a matrix representing the
change in matrix W;;.
The learning process is implemented by a change of the
connection weight matrix and thresholds, in which the error
function E gives the greatest gradient descent direction to
change the connective weight, the weights connected with
different neurons are updated by equations (17) and (19). It
results in gaining the best weight coefficient sets.
(h) Iteration (a) through (f) continues until the value of the
error is less than a desired threshold or the iterations times
exceed a specified time. This means 'training process' is
complete. The learning of BP nets by least error function
rule completes its non-linear mapping from input to output.
(i) Input the digital image to be classified to the BP network
that has completed the learning process, you can then
generate a complete image classification.
In practice, because BP algorithm adopts the simple gradient
descent, the rate of convergence is very slow, and the local
minima often occurs. Thus iterative process cannot converge to
global optimum solution, and especially big trainings and lots
of input parameters will remarkably hamper the learning effect.
So Simulated Annealing algorithm is introduced to globally
optimize in the networks learning.
Simulated Annealing algorithm(SA) was put forward by S.
Kirkpatrick(1983). Its postulate was contrasting the solution of
some kind of optimizing with the heat balance of energetics
[nternati
statistics
can find
Suppose
combina
function
is clear
formly w
C
Simulate
Procedur
initial va
4) .
(5) |
(6) |
(733
criterion’
temperat
(8)
of tempe
CL
In Above
the next
as the n
current s
Accept(].
Usually f
Procedur
=
when C ;
The aim
globally
disturban
restriction
away for
solution.”
algorithm
For aim i
the rando
W = H
iJ