PI-2-2
Layered feed-forward neural networks have been broadly applied
to prediction, classification, pattern recognition and other model
ing problems. Hill et al (1994) gave an almost perfect review of
studies comparing LNNs with conventional statistical models.
Let x = {x ( } (/ = 1,2, ••*,/) represent the feature vector which
is to be classified. Let the possible classes be denoted by
tOy (/= 1,2, •••,/). Consider the discriminant function dj(x),
and then the decision rule is
xEcOj, if dj(x)*d..(x) for all j* * j
(1)
An LNN is expected to be the input-output (I/O) system corre
sponding to the discriminant function.
The multi-layered neural network being applied to a variety of
classification problems has an input layer, an output layer and
several hidden layers. The neural network to be trained can be
viewed as a parameterized mapping from a known input to the
output which should be as close as possible to the training data.
A feature vector is an input to the input layer; that is, the number
of neurons in the input layer corresponds to the dimension of the
feature vectors. The number of neurons in the hidden layer can be
adjusted by the user. The output layer has the same number of
neurons as the classes.
Let the state of the h th neuron be represented by
u h = g(x,w)
= l x i w ih >
j=i
(2)
where w ih is a parameter functioning as a synaptic weight be
tween neurons included in the designed LNN. These parameters
are mainly constituted by the connection weights (synaptic
weights) between neurons. The output signal from the j th neuron
in the output layer is regarded as the discriminant value. The
output of LNN, under presentation of or, is
Oj(x,w)= f(Uj),
(3)
where /(•) is an activation function. The following sigmoid
function, which is bounded, monotonic and non-decreasing, is
frequently used,
/(«,) = " ~ ( 7-
l + exp(-M ; )
(4)
The feature vectors x k (k =1,2for training the LNN
are prepared. The classes to which these feature vectors belong
are all known. Training data (target data) are given as follows:
1
0
d j( x k)=
if x k Eo)j
otherwise .
(5)
Training of the LNN is performed through the adjustment of
connection weights. The most commonly used method is so-
called back propagation, which is a gradient descent method in
essence.
An error function can be defined as the sum of the squares of the
errors for the overall training set. The LNN is trained by mini
mizing the value of the error function; that is,
min E = ^E k ,
*= l
where
E k =\l^j( x k^)-dj{x k )\ .
2 jm\
(6)
(7)
Among several methods for training the LNN, gradient descent
methods are most commonly used. There are two approaches for
their application to feed-forward neural networks (Bose and Liang
(1996)). One is based on modifying the weights according to the
rule,
= wf + T| • Aw ih ,
= • AW;,
Aw,,, = —
dE
Aw hj = -
dw ih
dE
dw hj
01)
where rj > 0 is the step-size parameter. The other is based on
modifying the weights according to the rule (8), (9) and
Aw,,. = - ■
BE,
Aw/y = -
Bw if
BE,
dw
hi
The data are repeatedly presented in either approach, until the
processes converge, although there is no guarantee of conver
gence to the solution. Following precedents, we call the former
approach periodic updating and the latter continuos updating
(Bose and Liang (1996)). And an entire pass through all the data
set is called an epoch.
Through chain differentiation
BE,
BE k Boj Buj d f du t
Bw ih Boj Buj Bf Bu h Bw ih
= (°j ~ d j)‘ f ( u j)' w hj -f(“h)-x h ,
(14)
BE
k BE k Boj Buj
Bw
Bo; Bu, Bw,
hj ”” j j urv hj
= (°j ~ d j )'f'(Uj)-X h
(15)