International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B7. Istanbul 2004
Input Hidden Output
Layer Input Layer Layer Layer
Weights Weights
Blue nl Calluna vulgaris
Green n2 Mire
Red n3 Bogmire
NiR : Bracken
NDVI . Acid grass
Molinia cearula
Juncus
nil P| Burnt Calluna
Figure 1. Example of an ANN as applied in this study,
consisting of five input nodes, each connected to eleven hidden
nodes, which are linked with each one of the eight output
classes.
No universally applicable rule concerning the optimal number
of hidden layers and number of hidden nodes exists (Kavzoglu
and Mather, 1999). Most applications therefore apply extensive
and time-intensive trial and error tests to determine the optimal
design for each study, also known as structural stabilisation
(Bishop, 1995; Openshaw and Openshaw, 1997). In this study
the number of hidden nodes was calculated following three
literature recommendations. The method (Equation la)
suggested by (Atkinson et al., 1997) used only the number of
input bands (n) whereas (Dunne and Campbell, 1994)
recommended a formula considering only the number of output
bands (m) (Equation 1b). The third method (Equation 1c) by
(Miller et al., 1995) consisted of both parameters to calculate
the number of hidden nodes:
No. of hidden nodes- 27 + | (Atkinson et al, 1997) (la)
No. of hidden nodes nd (Dunne and Campbell, 1994) (1b)
2
No. of hidden nodes =2+/n + m (Miller et al., 1995) (1c)
Following these recommendations, ANNs consisting of 1l
(Atkinson et al., 1997), 28 (Dunne and Campbell, 1994) and 12
(Miller et al., 1995) hidden nodes were created. The network
training was carried out using the conjugate gradient algorithm,
which requires no definition of additional parameters, such as
momentum and learning rate for the gradient descent algorithm.
The activation function was the 'tanh' function leading to
quicker convergence than the Sigmoid activation function
(Bishop, 1995).
2.3.2. ANN weights Beside the design, the network
performance depends upon the choice of initial weights.
Weights connecting the nodes between each layer (Figure 1) are
initially assigned randomly and adjusted during the learning
process to minimise the global error. The influence of the
assignment of random weights was considered in this study by
initialising each neural network 10 times, each time with a
different combination of weights.
2.3.3. Training data The characteristics of the training data
have to represent the whole data set. Statistical parameters of
all classes, including standard deviation and deviation from the
mean, were calculated for all pixels. The deviation from the
mean was used as a guideline to include border and core pixels
in the training dataset. The integration of border pixels,
covering the whole spectral range of each class is needed to
allow the networks to learn the full characteristics of the
responding land cover classes (Foody, 1999). The data set of
the training site was separated into 2/3 training data and 1/3
validation data, consisting of 1363 pixels and 714 pixels
respectively for the OTA classification.
2.3.4. Training amount The amount of training applied to an
ANN influences its ability to generalise (Bishop, 1995). The
longer the network is trained, the more the danger increases that
the ANN becomes ‘overfitted’ to its training data, thereby
reducing its ability to generalise (Atkinson and Tatnall, 1997;
Benediktsson and Sveinsson, 1997). The training process can
be stopped according to one of the following user defined
options (Bishop, 1995):
" after a fixed number of epochs
=» after a certain CPU time
= whena minimum error function is reached
* after minimum gradient is reached and
learning per epoch is only marginal
= when the error value of validation datasets
starts to increase (cross-validation).
In most remote sensing applications the first approach is used,
training the ANN for a user defined fixed number of epochs.
However results in a previous study showed that generalisation
was significantly affected by such an approach (Mehner et al.,
2003). The longer the training process was carried out, the
higher the accuracy of the training data, showing a good fit of
the ANN model between input and output data. However it
caused the loss of generalisation, resulting in a decrease of
accuracy of the validation data of up to 15 % (Mehner et al,
2003). This study applied early stopping as criteria for the
amount of training carried out. Early stopping utilises cross-
validation to stop the training process when the Mean Squared
Error (mse) of the validation data starts to increase (Bishop,
1995; Duda et al., 2001). It allows maximum generalisation and
prevents the network from becoming overfitted to the training
data.
3. RESULTS AND DISCUSSION
The overall accuracy was calculated for all pixels, mixed and
unmixed, using the rank matrix (Bernard, 1998). The rank
matrix is a modified traditional confusion matrix, as it calculates
the accuracy based on the correctly classified positions and
classes.
The training of the ANNs using early stopping resulted in
different numbers of epochs for each ANN, depending on the
initialised random weights. Early stopping enabled the ANNS
to generalise and thereby classify the validation data to
accuracies similar to the accuracies of the training data.
912
Inter
3.1.
The
valic
accu
netw
Addi
com]
class
77.4
valid
over:
76.1
gene
the s
other
perfc
choic
accu
(ANI
diffe
of ra
study
ro
Overall accuracy (95)
Cn
E Tre
Figu
valid
32.€
The t
inves
data
pixels
integr
accur
desigi
nodes
classi
transf
28 hi
remot
à po
know.
suffic
Modi]
were {