Full text: CMRT09

CMRT09: Object Extraction for 3D City Models, Road Databases and Traffic Monitoring - Concepts, Algorithms, and Evaluation 
Data clustering is a difficult problem in unsupervised pattern 
recognition as the clusters in data may have different shapes and 
sizes. In the background of clustering techniques, the following 
terms are used in this paper (Jain et al., 1999): 
• A pattern (or feature vector), z, is a single object or data 
point used by the clustering algorithm. 
• A feature (attribute) is an individual component of a 
pattern. 
• A cluster is a set of similar patterns, and patterns from 
different clusters are not similar. 
• A distance measure is a metric used to evaluate the 
similarity of patterns. 
The clustering problem can be formally defined as follows (Jain 
et ah, 1999): Given a data set Z={zi,z 2 ,... ,z p ,... ,z Np } where z p 
is a pattern in the A^dimensional feature space, and N p is the 
number of patterns in Z, then the clustering of Z is the 
partitioning of Z into K clusters {C/,C 2 , . . . ,Ck} satisfying the 
following conditions: 
• Each pattern should be assigned to a cluster, i.e. 
U;.i C, = Z 
• Each cluster has at least one pattern assigned to it, i.e. 
C k * 0, k = l,...,K 
• Each pattern is assigned to one and only one cluster 
C k n Cj = 0, where k =£ j 
As previously mentioned, clustering is the process of 
identifying natural groupings or clusters within 
multidimensional data based on feature space through similarity 
measure. Hence, similarity measures are fundamental 
components in most clustering algorithms (Jain et ah, 1999). 
The most popular way to evaluate a similarity measure is the 
use of distance measures. The most widely used distance 
measure is the Euclidean distance, defined as: 
d{z lt Zj) = JZSifofc - z Jik f = \\zi.ZjW (1) 
N 2 
Generally, clustering algorithms can be categorized into 
partitioning methods, hierarchical methods, density-based 
methods, grid-based methods, and model-based methods. An 
excellent survey of clustering techniques can be found in 
(Kotsiantis and Pintelas, 2004). In this paper, the focus will be 
on the partitional clustering algorithms. Partitional clustering 
algorithms divide the data set into a specified number of 
clusters and then evaluate them by some criteria. These 
algorithms try to minimize certain criteria (e.g. a square error 
function) and can therefore be treated as optimization problems 
(Harvey et ah, 2002; Omran et ah, 2005; Wilson et ah, 2002). 
The most widely used partitional algorithm in clustering 
techniques is the iterative A-means approach (Kotsiantis and 
Pintelas, 2004). The objective function J that the A-means 
optimizes is: 
JK-means ~ Xj'=i £vz p 6Cfc ^ (2) 
Where m k is the centroid of the A-th cluster. The membership 
and weight functions u for A-means are defined as: 
u(m k \z v ) = \ 1 if d \ z v m k) = argmin k {d 2 (z p ,m k )} (3) 
f0 otherwise 
Consequently, the k-means method minimizes the intra-cluster 
distance. The A-means algorithm starts with A centroids (initial 
values are randomly selected or derived from a priori 
information). Then, each pattern z p in the data set is assigned to 
the closest cluster (i.e. closest centroid). Finally, the centroids 
are recalculated according to the associated patterns. This 
procedure is repeated until convergence is achieved. 
It is known that the A-means algorithm may reach local 
optimal solutions, depending on the choice of the initial 
cluster centres. Genetic algorithms have a potentially greater 
ability to avoid local optima through the localised search 
employed by most clustering techniques. Maulik and 
Bandyopadhyay (2004) proposed a genetic algorithm-based 
clustering technique, called GA-clustering, that proven to be 
effective in optimal clusters. With this algorithm, solutions 
(typically, cluster centroids) are represented by bit strings. The 
search for an appropriate solution begins with a population, or 
collection, of initial solutions. Members of the current 
population are used to create the next generation population 
by applying operations such as random mutation and 
crossover. At each step, the solutions in the current population 
are evaluated relative to some measures of fitness (which, 
typically, is inversely proportional to d), with the fittest 
solutions selected probabilistically as seeds for producing the 
next generation. The process performs a generate-and-test 
beam search of the solution space, in which variants of the 
best current solutions are most likely to be considered next. In 
the next section, an alternative clustering method to solve the 
local optimum problem of the A-means algorithm is described. 
The applied method adopts the artificial swarm bees algorithm 
as it has proved to give a more robust performance than other 
intelligent optimisation methods for a range of complex 
problems (Pham, 2006). 
3. CLUSTERING OF LIDAR DATA USING SWARM 
ARTIFICIAL BEE COLONY ALGORITHM 
Swarm Intelligence (SI) is an innovative distributed intelligent 
paradigm for solving optimization problems that originally 
took its inspiration from the biological examples by swarming, 
flocking and herding phenomena. These techniques 
incorporate swarming behaviours observed in flocks of birds, 
schools of fish, or swarms of bees, and even human social 
behaviour, from which the idea is emerged (Omran et al., 
2002, 2005; Paterlini and Krink, 2005; Pham et al., 2006; Wu 
and Shi, 2001). Data clustering and swarm intelligence may 
seem that they do not have many properties in common. 
However, recent studies suggest that they can be used together 
for several real world data clustering and mining problems 
especially when other methods would be too expensive or 
difficult to implement. 
Clustering approaches inspired by the collective behaviours of 
ants have been proposed by Wu and Shi (2001), Labroche et 
al. (2001). The main idea of these approaches is that artificial 
ants are used to pick up items and drop them near similar 
items resulting in the formation of clusters. Omran et al. 
(2002) proposed particle swarm optimization (PSO) clustering 
algorithm. The results of Omran et al. (2002, 2005) show that 
PSO outperformed A-means, fuzzy c-means (FCM) and other 
state-of-the-art clustering algorithms. More recently, Paterlini 
and Krink (2005) compared the performance of A-means, 
genetic algorithm (GA), PSO and Differential Evolution (DE) 
for a representative point evaluation approach to partitional 
clustering. The results show that GAs, PSO and DE 
outperformed the A-means algorithm. Pham et al. (2006) used 
the artificial bee colony algorithm for clustering of different 
data sets. The obtained results of their work show that their 
proposed artificial bee colony algorithm has better 
performance than both of standard A-means as well as GA- 
based method. In general, the literature review of recent
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.