Full text: Proceedings, XXth congress (Part 2)

tanbul 2004 
/orlets: 3D 
iments. In 
ce Software 
Spaceland 
1 Standard 
irtographic 
S.C., 2003, 
ong Kong 
agement, 9 
-ambridge, 
Jifferences 
nding Task. 
82. 
Nugmented 
1vironment 
ed Mobile 
1d Support 
, Springer- 
aring the 
ironments. 
25(6), pp. 
pments in 
in Three- 
isition of 
Behavior, 
ronmental 
7, 1999. 
Zwiers, J. 
Informing 
/ol. 6, pp. 
e Gender 
In IEEE 
orence on 
5-9, 2003. 
SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING 
FOR EXPLORATORY SPATIAL ANALYSIS 
J.H.Guan, F.B.Zhu, F.L.Bian 
* School of Computer, Spatial Information & Digital Engineering Center, Wuhan University, Wuhan, 430079, China- 
jhguan@wtusm.edu.cn 
TS, WG 11/6 
KEY WORDS: GIS, Analysis, Data Mining, Visualization, Algorithms, Dynamic, Multi-resolution, Spatial. 
ABSTRACT: 
Clustering can be applied to many fields including data mining, statistical data analysis, pattern recognition, image processing etc. In 
the past decade, a lot of efficient and effective new clustering algorithms have been proposed, in which famous algorithms 
contributed from the database community are CLARANS, BIRCH, DBSCAN, CURE, STING, CLIGUE and WaveCluster. All these 
algorithms try to challenge the problem of handling huge amount of data in large-scale databases. In this paper, we propose a 
scalable and visualization-oriented clustering algorithm for exploratory spatial analysis (CAESA). The context of our research is 2D 
spatial data analysis, but the method can be extended to higher dimensional space. Here, “Scalable” means our algorithm can run 
focus-changing clustering in an efficient way, and "Visualization-oriented" indicates that our algorithm is adaptable to the 
visualization situation, that is, choosing appropriate clustering granularity automatically according to current visualization resolution. 
Experimental results show that our algorithm is effective and efficient. 
1. INTRODUCTION 
Clustering, which is the task of grouping the data of a database 
into meaningful subclasses in such a way that minimizes the 
intra-differences and maximizes the inter-differences of these 
subclasses, is one of the most widely studied problems in data 
mining field. There are a lot of application areas for clustering 
techniques, such as spatial analysis, pattern recognition, image 
processing, and other business applications, to name a few. In 
the past decade, a lot of efficient and effective clustering 
algorithms have been proposed, in which famous algorithms 
contributed from the database community include CLARANS, 
BIRCH, DBSCAN, CURE, STING, CLIGUE and WaveCluster. 
All these algorithms try to challenge the clustering problem of 
handling huge amount of data in large-scale databases. 
However, current clustering algorithms are designed to cluster a 
certain dataset in the fashion of once and for all. We can refer to 
this kind of clustering as global clustering. In reality, take 
exploratory data analysis for example, the user may first want 
to see the global view of the processed dataset, then her/his 
interest may shift to a smaller part of the dataset, and so on. 
This process implies a series of consecutive clustering 
operations: first on the whole dataset, then on a smaller part of 
the dataset, and so on. Certainly, this process can be directed in 
an inverse way, that is, user's focusing scope shifts from 
smaller area to larger area. We refer to this kind of clustering 
operation as focus-changing clustering. In implementation of 
focus-changing clustering, the naive approach is to cluster the 
focused data each time from scratch in the fashion of global 
clustering. Obviously, such approach is time-consuming and 
low efficient. The better solution is to design a clustering 
algorithm that carries out focus-changing clustering in an 
integrated framework. 
On the other hand, although visualization has been recognized 
as an effective tool for exploratory data analysis and some 
visual clustering approaches were reported in the literature, 
these researches has focused on proposing new methods of 
visualizing clustering results so that the users can have a view 
of the processed dataset’s internal structure more concretely and 
directly. However, seldom concern has been put on the impact 
of visualization on the clustering process. To make this idea 
clear, let us take 2-dimensional spatial data clustering for 
example. In fact, clustering can be seen as a process of data 
generalization on basis of certain lower data granularity. The 
lowest clustering granularity of a certain dataset is the 
individual data objects in the dataset. If we divide the dataset 
space into rectangular cells of similar size, then a larger 
clustering granularity is the data objects enclosed in the 
rectangular cells. 
More reasonably, we define clustering granularity as the size of 
the divided rectangular cell in horizontal or vertical direction, 
and define relative clustering granularity as the ratio of 
clustering granularity over the scope of focused dataset in the 
same direction. Visualization of clustering results is also based 
on clustering granularity. However, visualization effect relies 
on the resolution of display device. For comparison, we define 
relative visualization resolution as the inversion of the size 
(taking pixel as measurement unit) of visualization window for 
clustering results in the same direction as the definition of 
clustering granularity. Obviously, a reasonable choice is that 
the relative clustering granularity is close to, but not lower than 
the relative visualization resolution of visualization window. 
Otherwise, the clustering results can't be visualized completely, 
which means we do a lot but only part of its effect is shown up 
in visualization. 
 
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.