Proceedings International Workshop on Mobile Mapping Technology

li, rongxing
Pl-2-4 
where Pj is output of the node j in the output layer (corre 
sponding to class j) and L is the number of independently adjusted 
parameters. The model that minimizes AIC is the best model. If 
only one model set is used, that is, the number of parameters is 
fixed, then AIC will result in the MLE solution. If two different 
model sets have the same value of the maximum likelihood, the 
model with the smaller number of parameters will be selected 
(principle of parsimony or Occam’s razor). 
3.5 Generalization based on AIC 
In case of LNNs, the number of parameters in (20) is determined 
by the number of parameters in the activation function and the 
number of connection weights. The number of parameters is 
basically determined by the numbers of input nodes, output nodes, 
hidden layers and hidden nodes. Here, the trade-off between the 
number of parameters and the overall goodness-of-fit of the model 
is also inevitable. 
Although the expectation of AIC is asymptotically unbiased up to 
the terms of order 0(1), it has a large variance, so that the pro 
cedure discussed in the previous sections are also problematic 
(Shimohira (1993)). 
To give an example of over-training, let y be a linear function of 
jc, 
y k =x k w + e k (k=l,---,K), (21) 
where e is the error term, and w is the coefficient vector. Let the 
regression model be expressed by matrices 
y = Xw + e . (22) 
Suppose e - A r (0, <TI), where o is an unknown standard error. 
ML estimator 
H-o-(x'xfx'y (23) 
is the solution to 
i K K 
max log L(w,o 2 ) = log2jt -logo 2 
h’jCj*' 2 2 
—r(y -XwY(y-Xw). (24) 
2 o 
The situation where the determinant of X'X is nearly zero is called 
multicollinearity where the problem (24) is ill-posed, that is, 
unstable. It is regarded as an over-fitting. The generalization of 
LNN classifiers is considered to be a problem of the search for an 
appropriate architecture and appropriate training algorithm so that 
the LNN performs well and minimizes the errors over all un 
known data. In the following chapters, we propose an LNN design 
with some techniques for improving the generalization of LNN 
classifiers; these are characterized by the architecture and training 
algorithm. 
4 LAYERED NEURAL NETWORK ARCHITECTURE 
DESIGN BASED ON AIC 
In this chapter, we propose an LNN architecture design for 
choosing an appropriate model in terms of not only the size of 
network but also a suitable activation function based on AIC. 
4.1 Choosing the appropriate size based on AIC 
In the three-layered neural network, the number of parameters L is 
determined by the number of hidden nodes H, the number of 
input nodes / and the number of output nodes J as follows 
L = I x H + H xj . (25) 
The numbers of nodes in the input and output layers are, in gen 
eral, fixed according to the practical application problem. The 
users, therefore, are only able to adjust the number of nodes in the 
hidden layer. 
4.1.1 Choosing the number of hidden nodes If we begin to 
select a hidden layer with too few nodes, the LNN may not be 
powerful enough for a given learning task, while too many hidden 
nodes would lead to over-fitting the training data. Therefore, an 
appropriate number of hidden nodes should be chosen so that the 
LNN can guarantee the generalization ability. 
The relationship between the number of hidden nodes H and the 
number of output nodes J has been fairly well discussed by Me- 
hrotra et al (1991), Weigend and Rumelhart (1991), and 
Amirikian and Nishimura (1995). They conclude that an LNN 
with one hidden layer and H hidden nodes the number of which 
equals the number of output nodes 7, is considered an appropriate 
size to execute a given classification task. 
We suggest choosing an appropriate size of LNN by using an AIC 
that can be used to simultaneously address both the parameter 
estimation and forecasting the generalization ability of the model 
on unknown data during the training process. We can choose the 
appropriate number of hidden nodes by changing the number of 
hidden nodes to get different sizes of LNN. Their AIC values are 
determined after the completion of training. The LNN yielding the 
minimum value of AIC will be chosen as being of the appropriate 
size to execute a given classification task. 
4.1.2 Pruning the connection weights Network pruning 
algorithms can be applied to get the minimum number of pa 
rameters (reducing the redundant parameters) so that the LNN is 
more efficient in both forward computation time and generalizati 
on capability. These algorithms have already been discussed by 
several researchers such as Sietsma and Dow (1990).
1
2
...
91
92
93
94
95
...
404
405
Full text: Proceedings International Workshop on Mobile Mapping Technology

Access restriction

Copyright

Note to user