Full text: Proceedings, XXth congress (Part 2)

  
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B2. Istanbul 2004 
  
AND AREA (80, oo) 
AND WITH CONFIDENCE 0.85; 
To complete this task, we should firstly recalculate the 
clustering granularity according to data objects distribution in 
the associated dataset, then divide the data space into 
rectangular cells of size equal to the minimum clustering 
granularity; count data objects density for each cell and store 
density value with each cell. 
3.3 Visualization of Clustering Process 
Although visualization has been recognized as an effective tool 
for exploratory data analysis and some visual clustering 
approaches were reported in the literature, these researches has 
focused on proposing new methods of visualizing clustering 
results so that the users can have a view of the processed 
dataset's internal structure more concretely and directly. 
However, seldom concern has been put on the impact of 
visualization on the clustering process. To make this idea clear, 
let us take 2-dimensional spatial data clustering for example. In 
fact, clustering can be seen as a process of data generalization 
on basis of certain lower data granularity. The lowest clustering 
granularity of a certain dataset is the individual data objects in 
the dataset. If we divide the dataset space into rectangular cells 
of similar size, then a larger clustering granularity is the data 
objects enclosed in the rectangular cells. 
  
   
  
  
  
  
=10 x] 
File Edit Cluster Help 
Dataset in-Spitial DB 
2400 rows 
No. X Y 
0001 102 101 
0002 112 100 
0003 101 102 
0004 103 106 
0005 100 110 
0006 107 110 
0007 110 102 
0008 116 100 
0009 113 105 
0010 109 111 Ei 
Inn11. - AZ 
ER o 
  
  
  
Figure 2: Expected result 
More reasonably, we define clustering granularity as the size of 
the divided rectangular cell in horizontal or vertical direction, 
and define relative clustering granularity as the ratio of 
clustering granularity over the scope of focused dataset in the 
same direction. Visualization of clustering results is also based 
on clustering granularity. However, visualization effect relies on 
the resolution of display device. For comparison, we define 
relative visualization resolution as the inversion of the size 
(taking pixel as measurement unit) of visualization window for 
clustering results in the same direction as the definition of 
clustering granularity. Obviously, a reasonable choice is that the 
relative clustering granularity is close to, but not lower than the 
relative visualization resolution of visualization window. 
Otherwise, the clustering results can't be visualized completely, 
which means we do a lot but only part of its effect is shown up 
in visualization. 
338 
  
  
File Edit Cluster Help 
E 
Region of Area: 
i(minx,miny)z (35,37) 
max maxy)z (27 4,271) 
[Cells in each row=32 
xintervalz8 
yInterval=8 Pi 
lObjects Number:2400 
The total layers=6 
(TreeLayer:1,0 
ÎTreeLayer:2,0 
(TreeLayer:3,0 
iTreeLayer 4,0 
TreeLayer:5,0 
fTreeLaver:5,0 
{TreeLaye 
  
  
   
  
  
  
  
Figure 3: CAESA's result 
Noises are random disturbance that reduces the clarity of 
clusters. In our algorithm, we can easily find noises and wipe 
off them by finding the cells with very low density and 
eliminate the data points in them precisely. This method can 
reduce the influence of the noises both on efficiency and on 
time. Unlike the noises, outliers are not well proportioned. 
Outliers are data points that are away from the clusters and have 
smaller scale compared to clusters. So outliers will not be 
merged to any cluster. When the algorithm finishes, the sub- 
clusters that have rather small scale are outliers. 
inl x 
  
File Edit Cluster Help 
UT TE TZ TZ TT 
(mac, maxy)=(282, 284) 
Layer:6, CellMo=1020 Points= 
((minx,miny)=(283,277) 
(ma, mas)=(290,284) 
Layer:6, CellNo=1021 Pointe= 
(mins, miny)=(275,285) 
(ma»umax)-(282,292) 
Laver:6, CellNo=1022 Points= 
iiminx,miny)z (283,285) 
ii maxx,maxy)- (280,292) 
Layer:6, CellNo=1023,Points= 
  
a 
  
  
ram 
  
  
  
ja 
  
Figure 4: CAESA'S result after user's focus changed 
4. ALGORITHM 
In this section we introduce the CAESA algorithm. First, we 
set the minimum clustering granularity according to data 
objects distribution in the whole concerned dataset; then divide 
the data space into rectangular cells of size equal to the 
minimum clustering granularity; after that, we count data 
objects density for each cell and store density value with each 
cell. Here, we adopt the cluster definition alike to grid-based 
clustering algorithms. It takes CASEA two steps to complete a 
given query, that is, calculate the statistical information and 
stores to the quad-tree, then get user's query and output the 
result. 
First we should create a tree structure to keep the information of 
the dataset, there are two parameters must be input to setup the 
Inte. 
tree 
stru 
the 
calc 
the 
info 
got 
assc 
COIY 
Tha 
here 
in m 
The 
  
Onc 
grar 
grar 
visu 
clus 
reso 
data 
min 
num 
COO! 
forn 
The
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.