75
2. CLUSTER ANALYSIS (CRIPPA, MUSSIO, 1995)
Cluster analysis collects different statistical procedures and
algorithms, able to classify data, to gather them in sets and to
give a preliminary explanation about their behavior.
The aim of cluster analysis is to minimize the internal
dispersion of a group of data (cluster), respect their center
(cluster point), and to maximize, at the same time, the distance
among the above defined clusters.
The procedures of cluster analysis could be subdivided, taking
into account the starting point, the collection strategy and the
end point respectively in:
• agglomerative or divisive techniques
• hierarchical or not- techniques
• clustering of clumping
Qualitative and quantitative cluster analysis could be performed
different kinds of algorithms; MEDOIDS algorithm is
recommendable in the first case, whilst the second task could be
achieved by ISODATA algorithm.
The main steps of ISODATA algorithm are the following:
• selection of preliminary cluster points and thresholds for
the iterations
• clustering in a cluster the elements neighboring a cluster
point.
• division of clusters, whose internal dispersion is greater
than the chosen threshold
• new clustering
• fusion of cluster, whose distance is less than the chosen
threshold
• starting a new iteration from the 2 nd step, or stopping the
algorithm if a reproducing point is achieved.
The step of clustering could be done in a different way. When
the objects are well configured, the Euclidean norm or the
Manhattan distance help to cluster the surrounded elements.
On the contrary, when the objects are ill configured, these
norms show evident difficulties to cluster the elements.
Hence specific techniques of parsing, like line following, region
growing, to be repeated by means of split and merge iterations,
provide the expected results. The parsing is a methodology of
Linguistics which has already been applied to artificial
intelligence, image processing and robotics.
Thus the clustering of ill-configured object could be obtained by
using these techniques whose main steps are the following: •
• selection in a cluster of the element which is the closest to
its cluster point
• selection in the same cluster of (if any) elements slightly
more distant from the cluster point, thus taking into
account possible bifurcations
• attribution of the role of cluster point to the just selected
elements
• reiteration of the 1 st step
• reiteration of the 2 nd step
• reiteration of the 3 rd , 4 th and 5 th steps as long as selection of
elements is possible
• selection of a possible not yet selected element as a new
cluster point; otherwise stop the algorithm
• reiteration, for the new cluster point only, of the 1 st 5 th
steps as long as a selection of elements is possible
when preliminary cluster points are not available, the algorithm
begins flatly at the 7 th step and it proceeds to form isolated
clusters. Notice that the construction of cluster the one after the
other, instead of all together, is less robust respect to wrong
selection of elements. Finally the same strategy, used for the
clustering, could be applied for the cluster fusion by means of a
cluster of clusters.
3. GRAPH THEORY (BELLONE, ET AL., 1996B)
Graph theory permits to study topological relations underlying
the geometry of different kinds of objects. A graph is a set of
nodes connected by arcs. In a connected graph a path always
exists between two nodes; the shortest path is called distance
and the largest distance called diameter. Each node of a
connected graph is reached by at least one arc; the number of
arcs of a node is called degree.
A connection matrix could be associated to each graph, being its
main diagonal elements coincident with the nodes and its off
diagonal elements coincident with the arcs. A corresponding
design matrix supplies in form of chart the connections between
arcs and nodes. A second and more refined design matrix
supplies in forma of chart the independent circuits and
constrained polylines among the arcs.
Since connection and design matrices are generally very sparse
ones, they should be stored avoiding their zero elements. Design
matrices tabulated by rows or columns and banded, profile or
sparse connection matrices are very suitable way to store them.
Connection matrices may have non-zero elements near their
main diagonal ones or anyway located. The arrangement of
non-zero elements could be improved by changing the order of
elements. The arrangement of the elements inside design
matrices could be also improved according to the order of the
elements achieved for the corresponding connection matrices.
The most important algorithms for numeration, ordering and
dissection of matrix elements use graph theory.
A level structure is a partition of a graph in different levels
whose nodes are connected to the ones of the preceding and/or
following level only. A level structure whose first level has one
node only, said root, is a tree. The number of nodes belonging
to a level is called width; the width of a level structure is the
maximum of the level width, whilst the number of levels is
called depth. The nodes belonging at each last local level are
called leaves. The distance of a generic node from the root
represents its (local) depth and the distance of same generic
node from the leaves represents its (local) height.
Relationship among matrices, graphs and level structures show
that the bandwidth of connection matrices depends upon the
numeration of the nodes of corresponding graphs’, an adequate
numeration requires a small width of the level structures and
this last property is easily achieved as their depth is large.
Furthermore since the minimization of bandwidth and profile
could not always be performed at the same time, the preliminary
dissection of a graph and the subsequent ordering of its
dissected parts supply the expected results.
I' 1 algorithm - Search of a diameter of a graph
The reiteration of the construction of level structures allows to
find the diameter of a graph. An arbitrary starting root initiates
the first level structure, whilst the nodes in the last level initiate
new level structure, to be substituted to the first one, if their
depth is greater. The same procedure has to be repeated until
two level structures of equal depths are found.