Proceedings, XXth congress: Proceedings, XXth congress

altan, m. orhan
  
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B7. Istanbul 2004 
  
  
  
  
  
  
  
  
  
  
  
Input Hidden Output 
Layer Input Layer Layer Layer 
Weights Weights 
Blue nl Calluna vulgaris 
Green n2 Mire 
Red n3 Bogmire 
NiR : Bracken 
NDVI . Acid grass 
  
  
  
Molinia cearula 
  
Juncus 
  
  
  
nil P| Burnt Calluna 
  
  
  
  
Figure 1. Example of an ANN as applied in this study, 
consisting of five input nodes, each connected to eleven hidden 
nodes, which are linked with each one of the eight output 
classes. 
No universally applicable rule concerning the optimal number 
of hidden layers and number of hidden nodes exists (Kavzoglu 
and Mather, 1999). Most applications therefore apply extensive 
and time-intensive trial and error tests to determine the optimal 
design for each study, also known as structural stabilisation 
(Bishop, 1995; Openshaw and Openshaw, 1997). In this study 
the number of hidden nodes was calculated following three 
literature recommendations. The method (Equation la) 
suggested by (Atkinson et al., 1997) used only the number of 
input bands (n) whereas (Dunne and Campbell, 1994) 
recommended a formula considering only the number of output 
bands (m) (Equation 1b). The third method (Equation 1c) by 
(Miller et al., 1995) consisted of both parameters to calculate 
the number of hidden nodes: 
No. of hidden nodes- 27 + | (Atkinson et al, 1997) (la) 
No. of hidden nodes nd (Dunne and Campbell, 1994) (1b) 
2 
No. of hidden nodes =2+/n + m (Miller et al., 1995) (1c) 
Following these recommendations, ANNs consisting of 1l 
(Atkinson et al., 1997), 28 (Dunne and Campbell, 1994) and 12 
(Miller et al., 1995) hidden nodes were created. The network 
training was carried out using the conjugate gradient algorithm, 
which requires no definition of additional parameters, such as 
momentum and learning rate for the gradient descent algorithm. 
The activation function was the 'tanh' function leading to 
quicker convergence than the Sigmoid activation function 
(Bishop, 1995). 
2.3.2. ANN weights Beside the design, the network 
performance depends upon the choice of initial weights. 
Weights connecting the nodes between each layer (Figure 1) are 
initially assigned randomly and adjusted during the learning 
process to minimise the global error. The influence of the 
assignment of random weights was considered in this study by 
initialising each neural network 10 times, each time with a 
different combination of weights. 
2.3.3. Training data The characteristics of the training data 
have to represent the whole data set. Statistical parameters of 
all classes, including standard deviation and deviation from the 
mean, were calculated for all pixels. The deviation from the 
mean was used as a guideline to include border and core pixels 
in the training dataset. The integration of border pixels, 
covering the whole spectral range of each class is needed to 
allow the networks to learn the full characteristics of the 
responding land cover classes (Foody, 1999). The data set of 
the training site was separated into 2/3 training data and 1/3 
validation data, consisting of 1363 pixels and 714 pixels 
respectively for the OTA classification. 
2.3.4. Training amount The amount of training applied to an 
ANN influences its ability to generalise (Bishop, 1995). The 
longer the network is trained, the more the danger increases that 
the ANN becomes ‘overfitted’ to its training data, thereby 
reducing its ability to generalise (Atkinson and Tatnall, 1997; 
Benediktsson and Sveinsson, 1997). The training process can 
be stopped according to one of the following user defined 
options (Bishop, 1995): 
" after a fixed number of epochs 
=» after a certain CPU time 
=  whena minimum error function is reached 
* after minimum gradient is reached and 
learning per epoch is only marginal 
= when the error value of validation datasets 
starts to increase (cross-validation). 
In most remote sensing applications the first approach is used, 
training the ANN for a user defined fixed number of epochs. 
However results in a previous study showed that generalisation 
was significantly affected by such an approach (Mehner et al., 
2003). The longer the training process was carried out, the 
higher the accuracy of the training data, showing a good fit of 
the ANN model between input and output data. However it 
caused the loss of generalisation, resulting in a decrease of 
accuracy of the validation data of up to 15 % (Mehner et al, 
2003). This study applied early stopping as criteria for the 
amount of training carried out. Early stopping utilises cross- 
validation to stop the training process when the Mean Squared 
Error (mse) of the validation data starts to increase (Bishop, 
1995; Duda et al., 2001). It allows maximum generalisation and 
prevents the network from becoming overfitted to the training 
data. 
3. RESULTS AND DISCUSSION 
The overall accuracy was calculated for all pixels, mixed and 
unmixed, using the rank matrix (Bernard, 1998). The rank 
matrix is a modified traditional confusion matrix, as it calculates 
the accuracy based on the correctly classified positions and 
classes. 
The training of the ANNs using early stopping resulted in 
different numbers of epochs for each ANN, depending on the 
initialised random weights. Early stopping enabled the ANNS 
to generalise and thereby classify the validation data to 
accuracies similar to the accuracies of the training data. 
912 
Inter 
3.1. 
The 
valic 
accu 
netw 
Addi 
com] 
class 
77.4 
valid 
over: 
76.1 
gene 
the s 
other 
perfc 
choic 
accu 
(ANI 
diffe 
of ra 
study 
ro 
Overall accuracy (95) 
Cn 
E Tre 
  
Figu 
valid 
32.€ 
The t 
inves 
data 
pixels 
integr 
accur 
desigi 
nodes 
classi 
transf 
28 hi 
remot 
à po 
know. 
suffic 
Modi] 
were {
1
2
...
925
926
927
928
929
...
1360
1361
Full text: Proceedings, XXth congress (Part 7)

Access restriction

Copyright

Note to user