Full text: Proceedings, XXth congress (Part 2)

  
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B2. Istanbul 2004 
  
In this paper, we propose a scalable and visualization-oriented 
clustering algorithm for exploratory spatial analysis (CAESA), 
in which we adopt the idea of combining DBSCAN and STING. 
We built a prototype to demonstrate the feasibility of the 
proposed algorithm. Experimental results indicate that our 
algorithm is effective and efficient. CAESA needs less region 
queries than DBSCAN does, and is more flexible than STING. 
2. RELATED WORK 
In recent years, a number of clustering algorithms for large 
databases or data warehouses have been proposed. Basically, 
there are four types of clustering algorithms: partitioning 
algorithms, hierarchical algorithms, density based algorithms 
and grid based algorithms. Partitioning algorithms construct a 
partition of a database D of n objects into a set of & clusters, 
where & is an input parameter for these algorithms. Hierarchical 
algorithms create a hierarchical decomposition is represented 
by a dendrogram, a tree that iteratively splits D into smaller 
subsets until each subset consists of only one object. In such a 
hierarchy, each node of the tree represents a cluster of D. The 
dendrogram can either be created from the leaves up to the root 
(agglomerative approach) or from the root down to the leaves 
(divisive approach) by merging or dividing clusters at each step. 
Ester et al. (1996) developed a clustering algorithm DBSCAN 
based on a density-based notion of clusters. It is designed to 
discover clusters of arbitrary shape. The key idea in DBSCAN 
is that for each point of a cluster, the neighborhood of a given 
radius has to contain at least a minimum number of points. 
DBSCAN can effectively handle the noise points (outliers). 
CLIQUE clustering algorithm (Agrawal et al., 1998) identifies 
dense clusters in subspace of maximum dimensionality. It 
partitions the data space into cells. To approximate the density 
of the data points, it counts the number of points in each cell. 
The clusters are unions of connected high-density cells with in a 
subspace. CLIQUE generates cluster description in the form of 
DNF expressions. 
Sheikholeslami et al. (1998), using multi-resolution property of 
wavelets, proposed the WaveCluster method which partitions 
the data space into cells and app lies wavelet transform on them. 
Furthermore, WaveCluster can detect arbitrary shape clusters at 
different degrees of detail. 
Wang et al. (1997, 1999) proposed a statistical information 
grid-based method (STING, STING+) for spatial data mining. It 
divides the spatial area into rectangular cells using a 
hierarchical structure and stores the statistical parameters of all 
numerical attributes of objects with in cells. STING uses a 
multi-resolution to perform cluster analysis, and the quality of 
STING clustering depends on the granularity of the lowest level 
of the grid structure. If the granularity is very fine, the cost of 
processing will increase substantially. However, if the bottom 
level of the grid structure is too coarse, it may reduce the 
quality of cluster analysis. Moreover, STING does not consider 
the spatial relationship between the children and their 
neighboring cells for construction of a parent cell. As a result, 
the shapes of the resulting clusters are isothetic, that is, all of 
the cluster boundaries are either horizontal or vertical, and no 
diagonal boundary is detected. This may lower the quality and 
accuracy of the clusters despite of the fast processing time of 
the technique. 
336 
However, all algorithms described above have the common 
drawback that they are all designed to cluster a certain dataset 
in the fashion of once and for all. Although methods has been 
proposed to focus on visualizing clustering results so that the 
users can have a view of the processed dataset's internal 
structure more concretely and directly, however, seldom 
concern has been put on the impact of visualization on the 
clustering process. 
Different from the algorithms abovementioned, the proposed 
CAESA algorithm in this paper is a scalable and visualization- 
oriented clustering algorithm for exploratory spatial analysis. 
Here, "scalable" means our algorithm can run focus-changing 
clustering in an effective and efficient way, and “visualization- 
oriented" indicates that our algorithm is adaptable to 
visualization situation, that is, choosing appropriate relative 
clustering granularity automatically according to current 
relative visualization resolution. 
3. SCALABLE AND VISUALIZATION-ORIENTED 
CLUSTERING ALGORITHM 
3.1 CAESA OVERVIEW 
Like STING, CAESA is also a grid-based multi-resolution 
approach in which the spatial area is divided into rectangular 
cells. There usually several levels of such rectangular cells 
corresponding to different levels of resolution, and these cells 
form a hierarchical structure: each cell at a high level is 
partitioned to form a number of cells at the next lower level. 
Statistical information associated with the attributes of each 
grid cell (such as the mean, standard deviation, distribution) is 
calculated and stored beforehand and is used to answer the 
user's queries. 
  
  
  
  
  
  
  
  
  
  
" Level 2 
Level 3 
Level 1 
Figure 1: Hierarchical structure 
Figure 1 shows a hierarchical structure for CAESA clustering. 
Statistical parameters of higher-level cells can easily be 
computed form the parameters of the lower-level cells. For each 
cell, there are attributed-dependent and attribute-independent 
parameters. The attribute-independent parameter is: 
n—number of objects(points) in this cell 
cellNo—number of this cell, it is calculated when the tree 
structure is establishing 
layerNo—layer number of this cell 
The attribute-dependent parameters are: 
m—mean of all values in this cell, where value is the distance 
of two objects in this cell 
s—standard deviation of all values of the attribute in this cell 
d—density of all values in this cell 
min—4he minimum value of the attribute in this cell 
max—4he maximum value of the attribute in this cell 
Intei 
dis 
in tl 
NO? 
cel 
infor 
rel 
In oi 
calcı 
min; 
corre 
curr 
and. 
The 
adop 
their 
Acco 
calcu
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.