Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Improved K-Means Clustering Using Constraints and Centroid Initialization


Affiliations
1 Department of Computer Science, Rathinam College of Arts Science, India
     

   Subscribe/Renew Journal


The rapid worldwide increase in the data available leads to the difficulty for analyzing those data. Organizing data into interesting collection is one of the most basic forms of understanding and learning. Thus, a proper data mining approach is required to organize those data for better understanding. Clustering is one of the standard approaches in the field of data mining. The main of this approach is to organize a dataset into a set of clusters, which consists of similar data items, as calculated by some distance function. K-Means algorithm is the widely used clustering algorithm because of its ability and simple nature. When the dataset is larger, K-Means will misclassify the data points. For overcoming this problem, some constraints must be included in the algorithm. The resulting algorithm is called as Constrained K-Means Clustering. The constraints used in this paper are Must-link constraint, Cannot-link constraint, δ-constraint and ε-constraint. For generating the must-link and cannot-link constraints, Self Organizing Map (SOM) is used in this paper. The accuracy of clustering can be further improved by initializing the centroid instead of random generation. For this purpose, this paper uses Ant Colony Optimization (ACO). The experimental result shows that the proposed algorithm results in better classification than the standard K-Means clustering technique.

Keywords

K-Means, Self Organizing Map (SOM), Constrained K-Means, Ant Colony Optimization (ACO).
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 250

PDF Views: 2




  • Improved K-Means Clustering Using Constraints and Centroid Initialization

Abstract Views: 250  |  PDF Views: 2

Authors

P. Boopathi
Department of Computer Science, Rathinam College of Arts Science, India

Abstract


The rapid worldwide increase in the data available leads to the difficulty for analyzing those data. Organizing data into interesting collection is one of the most basic forms of understanding and learning. Thus, a proper data mining approach is required to organize those data for better understanding. Clustering is one of the standard approaches in the field of data mining. The main of this approach is to organize a dataset into a set of clusters, which consists of similar data items, as calculated by some distance function. K-Means algorithm is the widely used clustering algorithm because of its ability and simple nature. When the dataset is larger, K-Means will misclassify the data points. For overcoming this problem, some constraints must be included in the algorithm. The resulting algorithm is called as Constrained K-Means Clustering. The constraints used in this paper are Must-link constraint, Cannot-link constraint, δ-constraint and ε-constraint. For generating the must-link and cannot-link constraints, Self Organizing Map (SOM) is used in this paper. The accuracy of clustering can be further improved by initializing the centroid instead of random generation. For this purpose, this paper uses Ant Colony Optimization (ACO). The experimental result shows that the proposed algorithm results in better classification than the standard K-Means clustering technique.

Keywords


K-Means, Self Organizing Map (SOM), Constrained K-Means, Ant Colony Optimization (ACO).