al
VW
n
combined with a pre-defined weight value to get one distance
function. Choosing the pre-defined weight value is not trivial
and it is chosen intuitively in their research. Wang (2007) used
only spatial neighbourhood relations to cluster network data
without considering the temporal domain. As a result, the
dynamics in the network are not captured.
Spatial clustering is not sufficient to understand ‘events’ since
to describe an event, one needs to answer the questions of what,
when and where. In other words, thematic, temporal and spatial
domains should be combined in a consistent way to have a
better understanding of spatial phenomenon. Wei (2009)
divided the time line into fixed size intervals and calculated the
similarity based on the thematic domain. Spatial domain is used
by means of defining a spatial distance threshold. However,
how to choose the spatial distance threshold was not discussed.
In addition, clustering results depend on the size of the chosen
temporal interval. Neill (2005) emphasized on the significance
of temporal domain. They used a probabilistic approach to
detect emerging spatio-temporal clusters. However, the spatio-
temporal process is assumed to follow a Poisson distribution
which may not be the real case or time-consuming tests should
be done to verify this assumption. Chan et al. (2008) captured
the temporal dynamics of a graph by inspecting on the presence
or absence of an edge. Their main task is to detect the regions
where the change (absence/presence of an edge) is spatio-
temporally correlated.
3. SPATIO-TEMPORAL CLUSTERING ON
SPATIALLY EMBEDDED NETWORKS
Theoretically, one can represent the spatio-temporal objects as
either vertices or edges in an undirected graph. An example of
this is shown at figure 1. Figure 1 (a) represents the objects at
vertices and Figure 1 (b) represents the same objects at edges.
Figure 1(c) is the adjacency matrix for both of the graphs shown
at Figure 1(a-b). To be consistent with the case study, from now
on the representation shown at Figure 1 (b) will be used. Thus,
spatio-temporal objects are the edges of the graph and vertices
connect the objects coincident to them. In either case, the idea
behind the representation is to obtain the adjacency matrix.
te}
Figure 1: Different graph representations of same data
Although the algorithm is designed for network data, this
algorithm could be used for spatio-temporal clustering
whenever the spatio-temporal phenomenon (which can exhibit
in point, line or polygon) could be represented as a graph
structure (G — (V, E) where V represents the set of vertices and
E represents the set of edges).
75
Once the graph structure of the spatio-temporal phenomenon is
acquired, then a matrix showing the connectivity between
vertices (or edges); adjacency matrix; is created for the graph
structure. While creating the adjacency matrix (if exists), the
direction of the edges could be incorporated.
Up to now, the spatial domain is used to acquire the adjacency
matrix of the spatio-temporal phenomenon. Temporal and
thematic domains are exploited at this stage. Temporal domain
is divided into equal parts where each part will have only two
consecutive observations in the thematic domain. This is called
as the basic temporal interval. For example basic temporal
interval k of the object p consists of the two thematic attribute
observations of p" object at consecutive times of k-/ and k. At
each comparison step, basic temporal interval is shifted one
time step. Thus, if the time-series has a length of ¢, there will be
t - 1 similarity results for the two adjacent objects’ similarity
comparison. Since it consists of two consecutive (in temporal
domain) observations, it is possible to derive several different
similarity metrics (slope of change, difference/mean of the two
observations,..) to compare between an object and the objects
which are adjacent to it. Also, all of the possible
similarities/dissimilarities between the two compared time
series will be captured by this way (since it is not sound to have
a basic temporal interval of size one). This is the first novelty of
this research, since there is no need to specify a window size at
temporal domain and it is designed to be the simplest possible,
having two consecutive observations. In addition, this will
allow capturing all of the possible similarities between two time
series.
The similarity function is defined at basic temporal interval of À
for two adjacent objects p and q with at least four inputs (i.e. py.
» Pie dk1, dx) Where the thematic attribute value of p and q at
times k-/ and k are denoted as py; and p, and qy, and gy
respectively. Similarity function takes at least these four inputs,
because some other parameters (which should be defined using
background knowledge) may be needed to define the flexibility
of similarity comparison.
For the objects to be labelled as positively similar at basic
temporal interval k two requirements should be fulfilled:
Firstly, the direction of change in thematic attribute values (i.e.
slope) should be same and secondly, the thematic values of both
objects should be similar which is quantified by the parameter ó.
This requirement needs to be symmetric (e.g. if spatial object p
is found to be positively similar at basic temporal interval k with
the spatial object q, then ¢ should also be positively similar with
p at K" basic temporal interval) , thus has two parts separated by
a logical or operator. These two requirements for a positive
similarity are illustrated at equations 1 and 2 respectively. If
either of these conditions hasn't met, then the similarity
function will return a negative similarity result.
Pr Pr >0
(1)
Qy — Qa
(1-0)(g; +911) < Pr + Ppr «(0 t ÓY(q, + gr)
v (2)
(01—6Xp, t p,«q.*qui «(*-6Yp, t p.i)
These similarity criteria are one of the many possibilities,
however we tried to make it as generic as possible.
> Nc
eibi i AN
i
3
3
i
ÿ
$
i
1
1
j
i
7
Et