P1-2-7
an LNN with 4 hidden nodes and others is more than 30. Conse
quently, an LNN with 4 hidden nodes was chosen.
A1C
Number of hidden nodes
Figure 3 AIC for competing size sets of LNN.
Once the size of the nodes in the hidden layer in the LNN was
fixed at 4, competing architecture sets of different forms of acti
vation function were applied. Let us stress again that the proce
dure described in this chapter is only an example and there may
be many alternatives. Moreover it is desirable that the procedure
is carried out with feedback or simultaneity searching for the
optimal architecture. However, as it is computationally too costly,
we did not search all possible subsets of the models. The mini
mized AIC was 131 for the architecture which adopted the
parameter a =-0.8, and 138 for the one which used the normal
sigmoid function (a =-1.0) (Fig. 4).
After choosing the appropriate architecture, the best parameter set
of the model could be estimated; this minimizes the modified
error function for validation data with the penalty parameter y=?
2.0 x 10 -5 . Figure 5 shows how introduction of the penalty term
contributes to the generalization of the LNN and that the im
provement of generalization by Tikhonov’s regularization was
notable.
The classification results of the selected appropriate models at
each step were compared with those of the base model, which has
seven nodes in the hidden layer trained on the standard back-
propagation algorithm. Table 1 shows the comparison among the
results of the competing models. The left column shows the re
sults of Model (a) in which the number of hidden nodes and
output nodes are equally 7, and pruning has not been done. The
AIC
Value of parameter a
Figure 4 AIC and the form of activation function.
middle column shows the results of Model (b) which achieved the
minimization of AIC by reducing the number of hidden nodes and
pruning some connection weights. And the right column shows
the results of Model (c) obtained by applying Tikhnov's regulari
zation to Model (b).
Table 1 The comparison among the results.
(a)
(b)
(c)
Number of Input Nodes
12
Number of Hidden Nodes
7
4
Parameter
a
-1.0
-0.8
Y
0
2.0 X 10' 5
AIC
246
131
(225)*
Accuracy (%)
85.4
87.2
92.7
* The parentheses shows that 225 is not the value of AIC properly
but the value of corresponding objective function.
Accuracy (%)
Figure 5 The regularization penalty parameter y and the
accuracy for validation data.
It is shown that reducing the number of hidden nodes and pruning
the connection weights are effective for the decrease of the num
ber of parameters so that the minimization of AIC can be achieved.
The improvement of generalization by Tikhonov’s regularization
is also notable.
7 CONCLUSION
In this paper, we introduced techniques for the generalization of
Layered Neural Networks (LNNs) and proposed LNN design in
the classification of remotely sensed images.
We discussed the generalization of LNN classifiers, a controversi
al and often vague term in the neural network literature, and
introduced some techniques based on information statistics.
Akaike’s Information Criterion (AIC) was introduced for LNNs
taking into consideration the fact that the output of the LNN,
which has been trained with a sufficient number of training data,
is considered as an approximated estimate of a Baysian posterior
probability. Then, we gave a clear description of LNN generaliza