Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Document Clustering Using K-means and K-medoids


Affiliations
1 IIIT Bhubaneswar, Bhubaneswar, Odisha., India
2 Department of Information and Technology, Gauhati University, Guwahati., India
     

   Subscribe/Renew Journal


With the huge upsurge of information in day-to-day's life, it has become difficult to assemble relevant information in nick of time. But people, always are in dearth of time, they need everything quick. Hence clustering was introduced to gather the relevant information in a cluster. There are several algorithms for clustering information out of which in this paper, we accomplish K-means and K-Medoids clustering algorithm and a comparison is carried out to find which algorithm is best for clustering. On the best clusters formed, document summarization is executed based on sentence weight to focus on key point of the whole document, which makes it easier for people to ascertain the information they want and thus read only those documents which is relevant in their point of view.

Keywords

Clustering, K-means, K-medoids, WEKA3.9, Document Summarization
Subscription Login to verify subscription
User
Notifications
Font Size


  • Dhillon, I. S., Fan, J. & Guan, Y. (2001). Efficient Clustering of Very Large Document Collections (Chapter 1). doi:10.1145/502512.502550.
  • Ding, C. & He, X. (2004). K-means Clustering via Principal Component Analysis, 225-232.
  • Satheelaxmi, G., Murty, M. R., Murty, J. V. R. & Reddy, P. (2012). Cluster analysis on complex structured and high dimensional data objects using K-means and EM algorithm. International Journal of Emerging Trends & Technology in Computer Science, 1(1).
  • Hu, G., Zhou, S., Guan, J. & Hu, X. (2008). Towards effective document clustering: A constrained K-means based approach. Information, Processing and Management, 44(4), 1397-1409.
  • Jain, S., Aalam, M. A. & Doja, M. N. (2010). K-means Clustering Using Weka Interface. Proceedings of the 4th National Conference; INDIACom-2010. New Delhi: Bharati Vidyapeeth’s Institute of Computer Applications and Management.
  • Barioni, M. C. N., Razente, H. L., Traina, A. J. M. & Traina, C. Jr. (2006). An Efficient Approach to Scale Up K-medoid Based Algorithms in Large Databases.
  • Wang, D., Zhu, S., Li, T., Chi, Y. & Gong, Y. (2008). Integrating Clustering and Multi-Document Summarization to Improve Document Understanding.

Abstract Views: 480

PDF Views: 4




  • Document Clustering Using K-means and K-medoids

Abstract Views: 480  |  PDF Views: 4

Authors

Rakesh Chandra Balabantaray
IIIT Bhubaneswar, Bhubaneswar, Odisha., India
Chandrali Sarma
Department of Information and Technology, Gauhati University, Guwahati., India
Monica Jha
Department of Information and Technology, Gauhati University, Guwahati., India

Abstract


With the huge upsurge of information in day-to-day's life, it has become difficult to assemble relevant information in nick of time. But people, always are in dearth of time, they need everything quick. Hence clustering was introduced to gather the relevant information in a cluster. There are several algorithms for clustering information out of which in this paper, we accomplish K-means and K-Medoids clustering algorithm and a comparison is carried out to find which algorithm is best for clustering. On the best clusters formed, document summarization is executed based on sentence weight to focus on key point of the whole document, which makes it easier for people to ascertain the information they want and thus read only those documents which is relevant in their point of view.

Keywords


Clustering, K-means, K-medoids, WEKA3.9, Document Summarization

References