Full text: CMRT09

CMRT09: Object Extraction for 3D City Models, Road Databases and Traffic Monitoring - Concepts, Algorithms, and Evaluation 
60 
(Freund and Schapire, 1996) for updating the weights and 
compared the performance of k-means and fuzzy c-means to 
their boosted versions, and showed better clustering results on a 
variety of datasets. (Saffari and Bischof, 2007) provided a 
boosting-based clustering algorithm which builds forward 
stage-wise additive models for data partitioning and claimed 
this algorithm overcomes some problems of Frossyniotis et al 
algorithm (Frossyniotis et al., 2004). It should be noted that the 
boost-clustering algorithm does not make any assumption about 
the underlying clustering algorithm, and so is applicable to any 
clustering algorithm. 
However, most of the above methods are provided and 
evaluated on artificial or standard datasets with small sizes and 
the significance of improvement in object extraction using this 
method is not evaluated in urban areas. In this paper, the boost 
clustering method is implemented and evaluated on two subsets 
of LiDAR data in an urban area. The results are then provided 
in the form of error matrix and some quality analysis factors 
used for the analysis of classification performance, and 
compared to the results of the core algorithm in boosting, 
simple k-means. 
2. BOOSTING ALGORITHM 
Boosting is a general method for improving the classification 
accuracy of any classification algorithm. The original idea of 
boosting was introduced by (Kearns and Valiant, 1998). 
Boosting directly converts a weak learning model, which 
performs just slightly better than randomly guessing, into a 
strong learning model that can be arbitrarily accurate. In 
boosting, after each weak learning iteration, misclassified 
training samples are adaptively given high weights in the next 
iteration. This forces the next weak learner to focus more on the 
misclassified training data. Because of the good classification 
performance of AdaBoost, it is widely used in many computer 
vision problems and some promising results have been obtained 
(Li et al., 2004). A few attempts have been accomplished to 
bring the same idea to the clustering domain. 
2.1 Boosting Clustering 
Boost-clustering is an ensemble clustering approach that 
iteratively recycles the training examples providing multiple 
clusterings and resulting in a common partition (Frossyniotis et 
al., 2004). In ensemble approaches, any member of the 
ensemble of classifiers are trained sequentially to compensate 
the drawbacks of the previously trained models, usually using 
the concept of sample weights. It is sometimes considered as a 
classifier fusion method in decision level. At each iteration, a 
distribution over the training points is computed and a new 
training set is constructed using random sampling from the 
original dataset. Then a basic clustering algorithm is applied to 
partition the new training set. The final clustering solution is 
produced by aggregating the obtained partitions using weighted 
voting, where the weight of each partition is a measure of its 
quality (Frossyniotis et al., 2004). Another major advantage of 
boost clustering is that its performance is not influenced by the 
randomness of initialization or by the specific type of the basic 
clustering algorithm used. In addition, it has the great advantage 
of providing clustering solutions of arbitrary shape though 
using weak learning algorithms that provide spherical clusters, 
such as the k-means. It is because the basic clustering method 
(k-means) is parametric, while the boost-clustering method is 
nonparametric in the sense that the final partitioning is specified 
in terms of the membership degrees h t j and not through the 
specification of some model parameters. 
This fact gives the flexibility to define arbitrarily shaped data 
partitions (Frossyniotis et al., 2004). 
The utilized algorithm is summarized below (Frossyniotis et al., 
2004): 
1. Input: Dataset number of clusters 
(C) and maximum number of Iterations (T), Initialize 
2. fort=ltoT 
a. produce a bootstrap replicate of original dataset 
b. apply the k-means algorithm on dataset to produce 
the cluster hypothesis h' = [h' n ,h' n ,...,h' iC ) where 
h ] is the membership of instance i to cluster j 
c. if t> 1, renumber the cluster indices of H' according 
to the results of previous iteration 
d. calculate the pseudo-loss 
yiCQ! (!) 
1 /=1 
e. set g- l ~ £ ' 
f. if s t > 0.5, go to step 3 
g. update distribution W: 
W : ' +i = 
ni A 
CQ‘ 
Z t 
(2) 
h. compute the aggregate cluster hypothesis: 
h 
= arg max V 
*=1 C r= , 
l0g(/?r) ,r 
S>sfe) “ 
(3) 
3. Output the final cluster hypothesis /// = ¡j* 
In the above algorithm, a set X of N dimensional instances x i? a 
basic clustering algorithm (k-means) and the desired number of 
clusters C are first assumed. At each iteration t, the clustering 
result will be denoted as H l , while //J is the aggregate 
partitioning obtained using clustering of previous iteration. 
Consequently, at the final step, H f is will be equal to n T . In 
this algorithm, at each iteration t, a weight w' is computed for 
each instance x such that the higher the weight the more 
difficult is for x to be clustered. At each iteration t, first a 
l 
dataset x‘ is constructed by sampling from X using the 
distribution w' and then a partitioning result h' is produced 
using the basic clustering algorithm. In the above methodology 
an index cq\ is used to evaluate the clustering quality of an 
instance x for the partition //'.In our implementation, index 
CQ is computed using equation 4. 
CQi = 1 - h' i gond - h\ bad (4) 
where 
h]¡.„oj = maximum membership degree of Xj to a cluster. 
h\ bad ~ the minimum membership degree to a cluster.
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.