Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Improved K-Means with Dimensionality Reduction Technique


Affiliations
1 Charotar Institute of Technology Changa, Nadiad, Gujarat, India
     

   Subscribe/Renew Journal


Clustering is the process of finding groups of objects such that the objects in a group will be similar to one another and different from the objects in other groups. K-means is a well known partitioning based clustering technique that attempts to find a user specified number of clusters represented by their centroid. K-means clustering algorithm often does not work well for high dimension; hence, to improve the efficiency, we apply PCA, dimensionality reduction technique, on data set and obtain a reduced dataset containing possibly uncorrelated variables. The challenging task for any clustering method is to determine the number of clusters beforehand. To find the number of cluster, we apply EM method that finds number of clusters user should choose by determining a mixture of Gaussians that fit a given data set. Finally the experiment results shows that the use of techniques such as PCA and EM, improve the efficiency of K-means clustering.

Keywords

Cluster, EM, K-Mean, PCA.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 250

PDF Views: 3




  • Improved K-Means with Dimensionality Reduction Technique

Abstract Views: 250  |  PDF Views: 3

Authors

Amit Thakkar
Charotar Institute of Technology Changa, Nadiad, Gujarat, India
Nikita Bhatt
Charotar Institute of Technology Changa, Nadiad, Gujarat, India
Amit Ganatra
Charotar Institute of Technology Changa, Nadiad, Gujarat, India
Arpita Shah
Charotar Institute of Technology Changa, Nadiad, Gujarat, India

Abstract


Clustering is the process of finding groups of objects such that the objects in a group will be similar to one another and different from the objects in other groups. K-means is a well known partitioning based clustering technique that attempts to find a user specified number of clusters represented by their centroid. K-means clustering algorithm often does not work well for high dimension; hence, to improve the efficiency, we apply PCA, dimensionality reduction technique, on data set and obtain a reduced dataset containing possibly uncorrelated variables. The challenging task for any clustering method is to determine the number of clusters beforehand. To find the number of cluster, we apply EM method that finds number of clusters user should choose by determining a mixture of Gaussians that fit a given data set. Finally the experiment results shows that the use of techniques such as PCA and EM, improve the efficiency of K-means clustering.

Keywords


Cluster, EM, K-Mean, PCA.