Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Analysis of Heuristic Measures for Cluster Split in Bisecting K-Means


Affiliations
1 Department of CSE, Gokaraju Rangaraju Institute of Engineering and Technology, India
2 Department of CSE, Jawaharlal Nehru University and Technology, Hyderabad, India
     

   Subscribe/Renew Journal


With ever increasing number of documents on web and other repositories, the task of organizing and categorizing these documents to the diverse need of the user by manual means is a complicated job, hence a machine learning technique named clustering is very useful. This paper proposes work is based on shared neighbors. Two documents are said to be neighbors of each other when their similarity is greater than a threshold. Here we choose to work with bisecting k-means in which cluster quality depends on choosing a cluster to be split till k clusters are formed. The automatic selection of cluster to be split is difficult and time consuming in text documents due to its high dimensionality. This paper implements Bisecting k-means a text document clustering technique to analyze the best criteria needed to select a cluster to be split. We have compared our results with the ones proposed in literature and our observed that our experimental results showed promising results when tested on real life data sets.

Keywords

Text Clustering, Similarity Measures, Coherent Clustering, Splitting Criteria.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 271

PDF Views: 3




  • Analysis of Heuristic Measures for Cluster Split in Bisecting K-Means

Abstract Views: 271  |  PDF Views: 3

Authors

Y. Sri Lalitha
Department of CSE, Gokaraju Rangaraju Institute of Engineering and Technology, India
A. Govardhan
Department of CSE, Jawaharlal Nehru University and Technology, Hyderabad, India

Abstract


With ever increasing number of documents on web and other repositories, the task of organizing and categorizing these documents to the diverse need of the user by manual means is a complicated job, hence a machine learning technique named clustering is very useful. This paper proposes work is based on shared neighbors. Two documents are said to be neighbors of each other when their similarity is greater than a threshold. Here we choose to work with bisecting k-means in which cluster quality depends on choosing a cluster to be split till k clusters are formed. The automatic selection of cluster to be split is difficult and time consuming in text documents due to its high dimensionality. This paper implements Bisecting k-means a text document clustering technique to analyze the best criteria needed to select a cluster to be split. We have compared our results with the ones proposed in literature and our observed that our experimental results showed promising results when tested on real life data sets.

Keywords


Text Clustering, Similarity Measures, Coherent Clustering, Splitting Criteria.