International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B2. Istanbul 2004
AND AREA (80, oo)
AND WITH CONFIDENCE 0.85;
To complete this task, we should firstly recalculate the
clustering granularity according to data objects distribution in
the associated dataset, then divide the data space into
rectangular cells of size equal to the minimum clustering
granularity; count data objects density for each cell and store
density value with each cell.
3.3 Visualization of Clustering Process
Although visualization has been recognized as an effective tool
for exploratory data analysis and some visual clustering
approaches were reported in the literature, these researches has
focused on proposing new methods of visualizing clustering
results so that the users can have a view of the processed
dataset's internal structure more concretely and directly.
However, seldom concern has been put on the impact of
visualization on the clustering process. To make this idea clear,
let us take 2-dimensional spatial data clustering for example. In
fact, clustering can be seen as a process of data generalization
on basis of certain lower data granularity. The lowest clustering
granularity of a certain dataset is the individual data objects in
the dataset. If we divide the dataset space into rectangular cells
of similar size, then a larger clustering granularity is the data
objects enclosed in the rectangular cells.
=10 x]
File Edit Cluster Help
Dataset in-Spitial DB
2400 rows
No. X Y
0001 102 101
0002 112 100
0003 101 102
0004 103 106
0005 100 110
0006 107 110
0007 110 102
0008 116 100
0009 113 105
0010 109 111 Ei
Inn11. - AZ
ER o
Figure 2: Expected result
More reasonably, we define clustering granularity as the size of
the divided rectangular cell in horizontal or vertical direction,
and define relative clustering granularity as the ratio of
clustering granularity over the scope of focused dataset in the
same direction. Visualization of clustering results is also based
on clustering granularity. However, visualization effect relies on
the resolution of display device. For comparison, we define
relative visualization resolution as the inversion of the size
(taking pixel as measurement unit) of visualization window for
clustering results in the same direction as the definition of
clustering granularity. Obviously, a reasonable choice is that the
relative clustering granularity is close to, but not lower than the
relative visualization resolution of visualization window.
Otherwise, the clustering results can't be visualized completely,
which means we do a lot but only part of its effect is shown up
in visualization.
338
File Edit Cluster Help
E
Region of Area:
i(minx,miny)z (35,37)
max maxy)z (27 4,271)
[Cells in each row=32
xintervalz8
yInterval=8 Pi
lObjects Number:2400
The total layers=6
(TreeLayer:1,0
ÎTreeLayer:2,0
(TreeLayer:3,0
iTreeLayer 4,0
TreeLayer:5,0
fTreeLaver:5,0
{TreeLaye
Figure 3: CAESA's result
Noises are random disturbance that reduces the clarity of
clusters. In our algorithm, we can easily find noises and wipe
off them by finding the cells with very low density and
eliminate the data points in them precisely. This method can
reduce the influence of the noises both on efficiency and on
time. Unlike the noises, outliers are not well proportioned.
Outliers are data points that are away from the clusters and have
smaller scale compared to clusters. So outliers will not be
merged to any cluster. When the algorithm finishes, the sub-
clusters that have rather small scale are outliers.
inl x
File Edit Cluster Help
UT TE TZ TZ TT
(mac, maxy)=(282, 284)
Layer:6, CellMo=1020 Points=
((minx,miny)=(283,277)
(ma, mas)=(290,284)
Layer:6, CellNo=1021 Pointe=
(mins, miny)=(275,285)
(ma»umax)-(282,292)
Laver:6, CellNo=1022 Points=
iiminx,miny)z (283,285)
ii maxx,maxy)- (280,292)
Layer:6, CellNo=1023,Points=
a
ram
ja
Figure 4: CAESA'S result after user's focus changed
4. ALGORITHM
In this section we introduce the CAESA algorithm. First, we
set the minimum clustering granularity according to data
objects distribution in the whole concerned dataset; then divide
the data space into rectangular cells of size equal to the
minimum clustering granularity; after that, we count data
objects density for each cell and store density value with each
cell. Here, we adopt the cluster definition alike to grid-based
clustering algorithms. It takes CASEA two steps to complete a
given query, that is, calculate the statistical information and
stores to the quad-tree, then get user's query and output the
result.
First we should create a tree structure to keep the information of
the dataset, there are two parameters must be input to setup the
Inte.
tree
stru
the
calc
the
info
got
assc
COIY
Tha
here
in m
The
Onc
grar
grar
visu
clus
reso
data
min
num
COO!
forn
The