Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

An Efficient K-Means Clustering Algorithm for Large Data


Affiliations
1 Department of Information Technology, Bapatla Engineering College, Bapatla, Andhra Pradesh, India
     

   Subscribe/Renew Journal


Cluster analysis is one of the major data analysis methods for clustering the large data sets. The cluster analysis deals with the problems of organization of a collection of data objects into clusters based on some similarity. K-means is one of the most popular data partitioning algorithms that solve the well known clustering problem. Performance of the k-means clustering greatly depends upon the correctness of the initial centroids. Typically the initial centroids for the original k-means clustering are determined randomly. So, the clustering result may reach the local optimal solutions, not the global optimum. Several improvements have been proposed to improve the performance of k-means algorithm. This paper proposes an Efficient k-means algorithm for finding the better initial centroids and an efficient way for assigning data points to appropriate clusters. The proposed algorithm is tested with six bench mark datasets, which are taken from UCI machine learning data repository and found that the proposed algorithm gives better result than the existing.

Keywords

Clustering, Data Partitioning, Data Mining, Heuristic K-Means, K-Means Algorithm.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 255

PDF Views: 4




  • An Efficient K-Means Clustering Algorithm for Large Data

Abstract Views: 255  |  PDF Views: 4

Authors

K. Srinivasa Rao
Department of Information Technology, Bapatla Engineering College, Bapatla, Andhra Pradesh, India
K. Kiran Kumar
Department of Information Technology, Bapatla Engineering College, Bapatla, Andhra Pradesh, India
P. Srinivasa Rao
Department of Information Technology, Bapatla Engineering College, Bapatla, Andhra Pradesh, India

Abstract


Cluster analysis is one of the major data analysis methods for clustering the large data sets. The cluster analysis deals with the problems of organization of a collection of data objects into clusters based on some similarity. K-means is one of the most popular data partitioning algorithms that solve the well known clustering problem. Performance of the k-means clustering greatly depends upon the correctness of the initial centroids. Typically the initial centroids for the original k-means clustering are determined randomly. So, the clustering result may reach the local optimal solutions, not the global optimum. Several improvements have been proposed to improve the performance of k-means algorithm. This paper proposes an Efficient k-means algorithm for finding the better initial centroids and an efficient way for assigning data points to appropriate clusters. The proposed algorithm is tested with six bench mark datasets, which are taken from UCI machine learning data repository and found that the proposed algorithm gives better result than the existing.

Keywords


Clustering, Data Partitioning, Data Mining, Heuristic K-Means, K-Means Algorithm.