Proceedings, XXth congress: Proceedings, XXth congress

altan, m. orhan
inbul 2004 
zii xl 
  
  
clarity of 
and wipe 
nsity and 
ethod can 
y and on 
portioned. 
; and have 
ill not be 
, the sub- 
inl x 
  
  
  
First, we 
| to data 
en divide 
al to the 
yunt data 
with each 
rid-based 
omplete a 
ation and 
utput the 
mation of 
setup the 
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B2. Istanbul 2004 
tree. One parameter is dataset to be placed in the hierarchical 
structure; the other parameter is the number of desired cells at 
the lowest level. It needs two step to build the quad-tree: 
calculate the bottom layer cell statistical information directly by 
the given dataset, then calculate the higher layer cells statistical 
information according to the lower layer cells. We only need to 
go through the dataset once in order to calculate the parameters 
associated with the grid cells at the bottom level, the overall 
computation time is nearly proportional to the number of cells. 
That is, query processing complexity is O(k) rather than O(n), 
here « means the number of cells. We will analyze performance 
in more detail in later sections. 
The process of creating the tree structure goes like this: 
  
Input: 
D //Data to be placed in the hierarchical structure 
k //Number of desired cells at the lowest level 
Output: 
D Tree 
CAESA BUILD Algorithm 
// Create the empty tree from top to down 
T = root node with data values initialized; //Initially 
only root node 
1= 1; 
repeat 
for cach node in level i do 
create 4 children nodes with initial values; 
end for 
i=i+l; 
until 4' = k: 
// Populate tree from bottom up 
for each item in D do 
determine leaf node j associated with the position of 
D; 
update values of j based on attribute values in item; 
end for 
i = log4(k); 
repeat 
i=i-1; 
for each node j in level i do 
update values of j based on attribute values in its 4 
children; 
until i = 1; 
End of CAESA BUILD Algorithm 
  
  
  
Figure 5: CAESA BUILD algorithm 
Once the user changes her/his focus scope, adjust clustering 
granularity based on the criterion that relative clustering 
granularity is close to, but not lower than the relative 
visualization resolution of visualization window. With the new 
clustering granularity, re-divide the focused data space; then 
resort to grid-based clustering algorithm to cluster the focused 
data. Note that new clustering granularity must be n times of the 
minimum clustering granularity where n is a positive nature 
number. Due to the sufficient information (such as location of 
coordinate, relevant coefficient etc.) , we can easily use the 
formal result to answer the new query. 
The process of query answering is as follows: 
  
Input: 
T Tree 
Q //Query 
Output: 
R //Regions of relevant cells 
CAESA QUERY Algorithm 
QUEUE q; 
1=1; 
repeat 
For each node in level i do 
if cell j is relevant to Q 
mark this cell as such; 
q.add( j );//cell j add to queue so as to calculate its 
neighbor 
end if 
i=1+ 1; //go on with the next level 
until all layers in the tree have been visited; 
for each neighbor cell i of cells in q AND i doesn't inq 
do 
if (density( 1) » M DENSITY) 
Identify neighboring cells of relevant cells to create 
regions of cells; 
end if 
end for 
D1=R.data(); 
calculate the new clustering granularity kl 
call CAESA BUILD Algorithm with parameter D1, k1 
  
  
End of CAESA QUERY Algorithm 
  
Figure 6: CAESA QUERY algorithm 
To handle very large databases, the algorithm constructs units 
instead of original data objects. To obtain these units, first, 
partitioning is done to allocate data points into cells. Statistics 
information is also obtained for each cell. Then we test to see if 
a cell is qualified as a unit. The definition of unit is as follows: 
The density of a unit is greater than a certain predefined 
threshold. Therefore, we determine if a cell is a unit by its 
density. If the density of a cell is M (M21) times of the average 
density of all data points, we regard such a cell as a unit. There 
may be some cells of low density but are in fact part of a final 
cluster, they may not be included into units. We treat such data 
points as separated sub-clusters. This method will greatly 
reduce the time complexity, as will be shown in the 
experiments. 
The clusters identified in this algorithm are denoted by the 
representative points. Usually, user may want to know the 
detailed information about the clusters and the data points they 
include. For each data point, we need not test which cluster it 
belongs to. Instead, we need only find the cluster a cell 
belonging to, and all data points in the cell belong to such 
cluster. This process can also improve the speed of the whole 
clustering process. 
5. PERFORMANCE EVALUATION 
We conduct several experiments to evaluate the performance of 
the proposed algorithm. The following experiments are run on a 
PC machine with Win2000 operating system (256M memory).
1
2
...
348
349
350
351
352
...
878
879
Full text: Proceedings, XXth congress (Part 2)

Access restriction

Copyright

Note to user