maximum likelihood method. On the other hand, it is easy to
handle two-dimensional data in neural networks. Therefore,
classification using co-occurrence matrices can be carried out
simply by using a neural network (Inoue et al., 1993).
Figure 1: Landscape pattern of three regions
2.2 Neural Network Methods for Pattern Recognition
A neural network is a directed graph consisting of neurons or
nodes arranged in layers with interconnecting links (Haykin,
1994). These structures represent systems composed of many
simple processing elements operating in parallel, whose func-
tion is determined by network structure, connection weights,
and node function (Hara et al., 1994).
Recently, neural networks have been applied to a number of
image classification problems due to the following characteris-
tics of neural networks (e. g., Chen et al., 1993): (1) they have
an intrinsic ability to generalize; (2) they make weaker a priori
assumptions about the statistical distribution of the classes in
the dataset than a parametric Bayes classifier; and (3) they are
capable of forming highly non-linear decision boundaries in the
feature space. Therefore, a neural network has the potential of
outperforming a parametric Bayes classifier when a feature
statistics deviate significantly from the assumed Gaussian nor-
mal distribution. Indeed, the results of Benediktsson et al.
(1990), Bischof et al. (1992), and Heermann and Khazenie
(1992) indicate that a neural network can classify imagery
better than a conventional supervised classification procedure
using identical training sites.
Several neural network models have been proposed since Ro-
senblatt (1958) introduced the perceptron. The most common
network type is the multilayer feed-forward neural network
with connections only between nodes in neighbouring layers.
The connection weights are iteratively adjusted in order to
minimize an error criterion function. One of the most popular
and widely investigated supervised learning paradigms is back-
propagation (Rumelhart et al., 1986). It uses a gradient descent
technique to minimize a cost function equal to the mean square
difference between the desired and actual net outputs. The
backpropagation method is an efficient algorithm and can solve
problems of non-linear decision. However, it suffers from the
weakness of very slow convergence during training. Very often
the learning dynamics stop at a local minima rather than the
global minima. Another procedure is introduced for this reason
here which stands out due to its extremely fast learning ability.
2.3 The ATL Network Model
ATL (Adaptive Threshold Learning) is a supervised feedfor-
ward network, but one that differs significantly in concept from
backpropagation. ATL is a proprietary paradigm belonging to
Neurotec, Inc. The ATL algorithm is similar to RCE (Restricted
Coulomb Energy) which is patented by Nestor. Inc. RCE got its
name from the way it models attractor basins, analogous to the
Coulomb law of attraction between particles of opposite electri-
cal charge. ATL is based on a similar concept.
Figure 2 shows the architecture of an ATL network. Input
nodes are fully connected to the internal nodes, and the internal
nodes are selectively connected to the output nodes. An output
node operates as an OR gate. If any of its inputs are active it
produces an output - otherwise it does not (Chester, 1993).
Output Layer
Hidden Layer
Input Layer
Figure 2: Three-layer topology of an ATL network
The ATL training algorithm attempts to create basins of attrac-
tion which cover each decision region. Figure 3 shows a simple
two-dimensional case. The circles in the diagram are the at-
tractor basins, whose center are located by the synaptic weight
vector, w, of the internal node. The radius 6j of the ith basin
corresponds to the node's threshold. If an input vector, i. falls
within an attractor basin, then the internal node associated with
that attractor basin is activated.
Class A
P.
Class B
Internal neuron
is not active
Figure 3: Two-dimensional decision regions with training
vectors and basins of attraction.
. The training process starts with no basins of attraction; the
72
system creates them as a result of actions taken when training
vectors are presented sequentially. The following two rules,
applied to each training vector in turn, suffice to produce these
basins (Wasserman, 1994):
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B7. Vienna 1996
ap