Efficient and Effortless Similarity Measures for Cluster Ensembles

R. J. Anandhi; Natarajan Subramaniyam

Efficient and Effortless Similarity Measures for Cluster Ensembles

R. J. Anandhi ¹, Natarajan Subramaniyam ²

Affiliations
1 Dept. of CSE, Dr. MGR University, Chennai, India
2 Department of Information Science and Engineering, PESIT, Bangalore, India

Spatial data mining basically deals with the discovery of implicit knowledge in spatial data. With the tremendous rise in the accumulation of spatial data, new approaches in spatial data mining has become is an critical requirement. With so many clustering algorithms and their derivatives available, and also the success stories of bagging and boosting in classification, has brought the area of cluster ensembles to limelight in the last decade. There are different techniques like voting, graph based and information theory approaches of ensembles available. In our work, we have brought out that by using a guided approach in combining the outputs of the various clusterers, we can reduce the intensive computations and also generate robust clusters. Cluster ensembles provide a tool for consolidation of results from a portfolio of individual clustering results. The major challenge in fusion of ensembles is the generation of voting matrix or proximity matrix which is in the order of n2, where n is the number of data points. This is very expensive both in time and space factors, with respect to spatial datasets. Instead, in our method, we compute a symmetric clusterer compatibility matrix of order (m×m), where m is the number of clusterers and m<<n, using the cumulative similarity between the clusters of the clusterers. This matrix is used for identifying which two clusterers, if considered for fusion initially, will provide more information gain. This paper discusses the need for simple, elegant yet effective similarity measures for cluster mining. As the underlying data structure is already known in the case of cluster ensembles, we have tried to utilize that knowledge to find the similarity between the probable clusterer merge points. We have used the set theory approach and the Shannon partition entropy as the basis for our calculation of multiparty merge entropy. The correctness and efficiency of the proposed cluster ensemble algorithm is demonstrated by usage of various cluster validity metrics like accuracy, misclassification rate, Dunn indices, inter cluster density and intra cluster density, measured for the real world datasets available in University of California Irvine’s data repository.

Keywords

Clustering Ensembles, Cluster Compatibility Matrix, Cluster Validity Metrics, Partition Entropy, Degree of Over Shadow.

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 226

PDF Views: 3

Efficient and Effortless Similarity Measures for Cluster Ensembles

Abstract Views: 226 | PDF Views: 3

Authors

R. J. Anandhi
Dept. of CSE, Dr. MGR University, Chennai, India

Natarajan Subramaniyam
Department of Information Science and Engineering, PESIT, Bangalore, India

Abstract

Keywords

Clustering Ensembles, Cluster Compatibility Matrix, Cluster Validity Metrics, Partition Entropy, Degree of Over Shadow.

Username
Password
Remember me

Username
Password
Remember me

Artificial Intelligent Systems and Machine Learning

Artificial Intelligent Systems and Machine Learning

Efficient and Effortless Similarity Measures for Cluster Ensembles

Subscribe/Renew Journal

Keywords

Efficient and Effortless Similarity Measures for Cluster Ensembles

Authors

Abstract

Keywords