tanbul 2004
/orlets: 3D
iments. In
ce Software
Spaceland
1 Standard
irtographic
S.C., 2003,
ong Kong
agement, 9
-ambridge,
Jifferences
nding Task.
82.
Nugmented
1vironment
ed Mobile
1d Support
, Springer-
aring the
ironments.
25(6), pp.
pments in
in Three-
isition of
Behavior,
ronmental
7, 1999.
Zwiers, J.
Informing
/ol. 6, pp.
e Gender
In IEEE
orence on
5-9, 2003.
SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING
FOR EXPLORATORY SPATIAL ANALYSIS
J.H.Guan, F.B.Zhu, F.L.Bian
* School of Computer, Spatial Information & Digital Engineering Center, Wuhan University, Wuhan, 430079, China-
jhguan@wtusm.edu.cn
TS, WG 11/6
KEY WORDS: GIS, Analysis, Data Mining, Visualization, Algorithms, Dynamic, Multi-resolution, Spatial.
ABSTRACT:
Clustering can be applied to many fields including data mining, statistical data analysis, pattern recognition, image processing etc. In
the past decade, a lot of efficient and effective new clustering algorithms have been proposed, in which famous algorithms
contributed from the database community are CLARANS, BIRCH, DBSCAN, CURE, STING, CLIGUE and WaveCluster. All these
algorithms try to challenge the problem of handling huge amount of data in large-scale databases. In this paper, we propose a
scalable and visualization-oriented clustering algorithm for exploratory spatial analysis (CAESA). The context of our research is 2D
spatial data analysis, but the method can be extended to higher dimensional space. Here, “Scalable” means our algorithm can run
focus-changing clustering in an efficient way, and "Visualization-oriented" indicates that our algorithm is adaptable to the
visualization situation, that is, choosing appropriate clustering granularity automatically according to current visualization resolution.
Experimental results show that our algorithm is effective and efficient.
1. INTRODUCTION
Clustering, which is the task of grouping the data of a database
into meaningful subclasses in such a way that minimizes the
intra-differences and maximizes the inter-differences of these
subclasses, is one of the most widely studied problems in data
mining field. There are a lot of application areas for clustering
techniques, such as spatial analysis, pattern recognition, image
processing, and other business applications, to name a few. In
the past decade, a lot of efficient and effective clustering
algorithms have been proposed, in which famous algorithms
contributed from the database community include CLARANS,
BIRCH, DBSCAN, CURE, STING, CLIGUE and WaveCluster.
All these algorithms try to challenge the clustering problem of
handling huge amount of data in large-scale databases.
However, current clustering algorithms are designed to cluster a
certain dataset in the fashion of once and for all. We can refer to
this kind of clustering as global clustering. In reality, take
exploratory data analysis for example, the user may first want
to see the global view of the processed dataset, then her/his
interest may shift to a smaller part of the dataset, and so on.
This process implies a series of consecutive clustering
operations: first on the whole dataset, then on a smaller part of
the dataset, and so on. Certainly, this process can be directed in
an inverse way, that is, user's focusing scope shifts from
smaller area to larger area. We refer to this kind of clustering
operation as focus-changing clustering. In implementation of
focus-changing clustering, the naive approach is to cluster the
focused data each time from scratch in the fashion of global
clustering. Obviously, such approach is time-consuming and
low efficient. The better solution is to design a clustering
algorithm that carries out focus-changing clustering in an
integrated framework.
On the other hand, although visualization has been recognized
as an effective tool for exploratory data analysis and some
visual clustering approaches were reported in the literature,
these researches has focused on proposing new methods of
visualizing clustering results so that the users can have a view
of the processed dataset’s internal structure more concretely and
directly. However, seldom concern has been put on the impact
of visualization on the clustering process. To make this idea
clear, let us take 2-dimensional spatial data clustering for
example. In fact, clustering can be seen as a process of data
generalization on basis of certain lower data granularity. The
lowest clustering granularity of a certain dataset is the
individual data objects in the dataset. If we divide the dataset
space into rectangular cells of similar size, then a larger
clustering granularity is the data objects enclosed in the
rectangular cells.
More reasonably, we define clustering granularity as the size of
the divided rectangular cell in horizontal or vertical direction,
and define relative clustering granularity as the ratio of
clustering granularity over the scope of focused dataset in the
same direction. Visualization of clustering results is also based
on clustering granularity. However, visualization effect relies
on the resolution of display device. For comparison, we define
relative visualization resolution as the inversion of the size
(taking pixel as measurement unit) of visualization window for
clustering results in the same direction as the definition of
clustering granularity. Obviously, a reasonable choice is that
the relative clustering granularity is close to, but not lower than
the relative visualization resolution of visualization window.
Otherwise, the clustering results can't be visualized completely,
which means we do a lot but only part of its effect is shown up
in visualization.