n of mixed
n output.
. 491-499.
and cover
mputers &
"oundation.
p. 138-199.
of remotely
vork. ISRS
bad, India:
for spectral
ral network.
e sensing,
ormance of
] unmixing
ote Sensing
ficial neural
ion satellite
Y
Nhite, K.H.,
ata for crop
sing, 13(3),
86.Learning
ielhart, D.E.
Processing:
Cambridge,
odelling and
J. Remote
Least-square
from remote
and Remote
note sensing
. 194-201.
the potential
ificial neural
sing, 63(11),
IAPRS & SIS, Vol.34, Part 7, "Resource and Environmental Monitoring", Hyderabad, India,2002
GENERALIZED 2D CLUSTERING TECHNIQUE FOR SPATIAL DATA ANALYSIS
K. Venugopala Rao ", M.V. Ravi Kumar? A. Ravi Kumar? T.CH. Malleswara Rao >
C.B.S. Dutt®,B.S Prakasa Rao *
* Scientist, NRSA, Dept. of Space, Govt. of India, Hyderabad
(venu, koppaka, ravikumar. mv, addepalli) nrsa.gov.in
? Group Head, AF, NRSA, Dept. of Space, Govt. of India, Hyderabad
tchmr@nrsa.gov.in
* Former GH, FEG, NRSA, Dy. Program Director, IGBP, ISRO, Hq, Bangalore
dutt cbs€ yahoo.com
^ Professor, Dept. of Geo Engineering, Andhra University, Visakhapatnam
bosukonda@rediffmail.com
KEYWORDS: Cluster, Centroid, Euclidean distance, triangulation, convex hull
ABSTRACT:
Clustering is a process used to solve classification problems. Its objective is to group data points into clusters so that the degree of
association is strong between members of the same cluster and weak between members of different clusters. Many of the presently
used clustering techniques are based on statistics and work with an assumption that the underlying data distribution is spherical.
These techniques have significant limitations on shape of distribution. All the distances are calculated relative to the centroids. These
centroids are representative of all points allocated to a particular cluster. These techniques fail to evaluate non-spherical distribution
data models. This paper presents a more generalized and shape independent 2D clustering technique that makes full and efficient use
of geometric relationship within the spatial data. In the proposed technique, first all the points of the spatial data set are triangulated,
then the edges of the triangles are used to generate edge graph. The neighbors of any node in edge graph associate are the nodes of
its adjacent edges. A global optimum edge length value is dynamically computed to partition the edge graph into inter-and intra
clusters. The significant advantage of this graph-based partition is that the number of clusters need not be known in advance.
1. INTRODUCTION
Cluster analysis divides data into groups or clusters in order to
improve understanding or finding structure in data. Clustering
algorithms divide a data set into clusters, where similar data
objects are assigned to the same cluster and dissimilar data
objects belong to different clusters. Clustering techniques fall
into a group of undirected data mining tools. The goal of
undirected data mining is to discover the underlying data
structure as a whole. Many clustering techniques are based on
statistics and work with an assumption that the underlying data
distribution is spherical (Jain et al., 1988; Euripides et al.,
1997; Athman 1996). K-means clustering algorithm is the
best choice to illustrate this.
‘1.1 K-means Algorithm
K-means algorithm is a simple, iterative procedure, in which a
crucial concept is the centroid. Centroid is an artificial point,
which represents an average location of the particular cluster
(Cormen et al, 1985). The coordinates of this point are
averages of attribute values of all points that belong to the
single cluster. The steps of the K-means algorithm are:
1. Select randomly k points to be initial.centriods of k
clusters.
2. Assign each point to the closest centroid.
3. Calculate new centroids of the clusters. The mean
Euclidean distance of the points belonging to the
same cluster could be used to calculate the new
centroids.
4. Check if the cluster centroids have changed their
coordinates. If yes, start again from the step.2. If not,
cluster detection is finished.
Figure 1 shows how the algorithm works in the case where k=2.
Randomly select the values for m; and m,, as seed centroids
and the algorithm converges when the means no longer change.
final centroid
Start my
Figure 1. Illustration of k- means Algorithm.
Any algorithms like k-means based on distances that are
calculated relative to the centriods assume that the data
distribution is spherical. These algorithms .completely fail on
non-spherical distribution data sets (Duda et al 1998). In many
real world situations shape independent clustering techniques
are needed. This paper presents generalized 2D-clustering
technique that is unaffected by the shape of the clusters. In the
proposed technique all the data points are first triangulated and
the edges are used to generate a graph. Then the resulting edge