Pl-2-8
tion from the viewpoint of architecture design and learning para
digm based on A1C.
Concerning the architecture design, the size of network (the num
ber of layers and nodes) and the type of activation functions are
important factors. We proposed LNN architecture design based on
the minimization of AIC. Concretely, different sized sets of LNN
were trained with pruning, and the number of hidden nodes and
the connections weights between nodes were determined based on
the minimization of AIC.
Once the architecture is fixed, the behavior of the trained model
depends on the values of the connection weights. It is known,
however, that AIC has a large variance, so that due to the limited
number of training data and the presence of noise, over-training
often presents problems. We introduced Tikhonov’s regulariza
tion to overcome the problem of over-training.
Finally, we designed an LNN classifier based on the proposed
procedure and applied it to a land cover classification problem.
Our experimental results illustrate the potential of the proposed
design techniques. We believe that the insight gained from this
study is complementary to a more general analysis for the gener
alization of feed-forward layered neural networks based on infor
mation statistics.
REFERENCES
[Akaike, 1974] Akaike, H., 1974. A new look at the statistical
model identification, IEEE Trans. Automat. Contr., 19(6),
pp.716-723.
[Amirikian and Nishimura, 1995] Amirikian, B. and Nishimura,
H., 1995. What size network is good for generalization of a
specific task of interest, Neural Networks, 7(2), pp.321-329.
[Bose and Liang, 1996] Bose, N. K. and Liang, P., 1996. Neural
Network Fundamentals with Graphs, Algorithms, and Applica
tions, NcGraw-Hill.
[Curran and Hay, 1986] Curran, P.J., and Hay, A.M., 1986.. The
importance of measurement error for certain procedures in remote
sensing at optical wavelengths, Photogramm. Eng. Remote Sens
ing, 52, pp.229-241.
[Forgel, 1991] Forgel, D. B., 1991. An information criterion for
optimal neural network selection, IEEE Transactions on Neural
Networks, 2(5), pp.490-497.
[Funahashi, 1989] Funahashi, K., 1989. On the approximate
realization of continuous mapping by neural networks, Neural
Networks, 2, pp.183-192.
[Gallant and White, 1988] Gallant, A.R. and White, H., 1988.
There exists a neural network.that does not make avoidable mis
takes, Proc. Int. Conf. Neural Networks, 1, pp.657-666.
[Hill et al., 1994] Hill, T., Marquez, L., O'Connor, M. and Remus,
W., 1994. Artificial neural network models for forecasting and
decision making, Int. Jour. Forecasting, 10, pp.5-15.
[Hoerl and Kennard, 1970] Hoerl, A. E. and Kennard, R. W.,
1970. Ridge regression : Biased estimation for nonorthogonal
problems, Technometrics, 12(1), pp. 55-67.
[Krogh and Hertz, 1992] Krogh, A. and Hertz, A. J., 1992. A
simple weight decay can iprove generalization, Advances in
Nueral Information Processing Systems 4, Moody, J. E., Hanson,
S. J. and Lippmann, R. P. eds., Morgan Kaufman Publishers.
[Mehrotra et al., 1991] Mehrotra, K.G., Mohan, C.K. and Ranka,
S., 1991. Bounds on the number of samples needed for neural
learning, IEEE Transactions on Neural Networks, 6, pp.548-558.
[Mehrotra et al., 1997] Mehrotra, K., Mohan, C. K. and Ranka, S.,
1997. Elements of Artificial Neural Networks, MIT Press.
[Ruck et al., 1990] Ruck, D. W., Rogers, S. K., Kabrisky, M.,
Oxley, M. E. and Suter, B. W., 1990. The multilayer perceptron
as an approximation to a Bayes optimal discriminant function,
IEEE Transactions on Neural Networks, 1(4), pp.296-298.
[Shimizu, 1996] Shimizu, E., 1996. A theoretical interpretation
for layered neural network classifier, Jour. JSPRS, 35(4), pp.4-8.
[Shimohira, 1993] Shimohira, H., 1993. A Model selection pro
cedure based on the information criterion with its variance,
METR, 93-16, University of Tokyo.
[Sietsma. and Dow, 1990] Sietsma, J. and Dow, R. J.F., 1990.
Creating artificial neural networks that generalize, IEEE Transac
tions on Neural Networks, 4, pp.67-79.
[Tikhonov et al., 1990] Tikhonov, A.N., Goncharsky, A.V., Ste
panov, V.V, and Yagola, A.G., 1990. Numerical Methods for the
Solution of Ill-posed Problems. Mathematics and Its Applications,
Kluwer Academic Publishers.
[Wan, 1990] Wan, E. A., 1990. Neural network classification: a
Bayesian interpretation. IEEE Transactions on Neural Networks,
1(4), pp.303-305.
[Weigend and Rumelhart, 1991] Weigend, A.S. and Rumelhart, D.
E., 1991. The effective dimension of the space of hidden units. In
Proc. IEEE Int. Joint Conf. Neural Network, Singapore, 3, pp.
2069-2074.
[Yool et al., 1986] Yool, S.R., Star, J.L., Estes, J.E., Botkin, D.B,
Eckhardt, D.W. and Davis, F.W., 1986. Performance analysis of
image processing algorithms for classification of natural vegeta
tion in the mountains of Southern California, Int. J. Remote
Sensing, 7, pp.683-702.