Open Access
Subscription Access
Open Access
Subscription Access
An Improved Bisecting K-Means Algorithm for Text Document Clustering
Subscribe/Renew Journal
Cluster analysis is an unsupervised learning approach that aims to group the objects into different groups or clusters. So that each cluster can contain similar objects with respect to any predefined condition. Text document clustering is the important technique of text mining in efficiently organizing the large volume of documents into a small number of significant clusters. The main objective of this research work is to cluster the collection of documents into related groups based on the contents of the particular documents. In order to perform this clustering task, this research work makes use of two existing algorithms, namely K-means and Bisecting K-means algorithm, and also this research work proposes a new clustering algorithm namely Enhanced-Bisecting K-means algorithm. From the experimental results it is observed that the proposed algorithm gives the better clustering accuracy than other algorithms.
Keywords
Text Mining, Text Document Clustering, K-Means, Bisecting K-Means, Enhanced Bisecting K-Means.
Subscription
Login to verify subscription
User
Font Size
Information
- Steinbach, M., Karypis, G., & Kumar, V. (2000). A Comparison of Document Clustering Techniques.
- Proceedings of Knowledge Discovery and Data Mining (KDD) Workshop Text Mining.
- Baghel, R., & Dhir, R. (2010). A frequent concepts based document clustering algorithm. International Journal of Computer Applications, July, 4(5), 6-12.
- Li, Y., Lv, X., Liu, Y., & Shi, S. (2010). Research on text clustering based on concept weight. 4th International Conference on Genetic and Evolutionary Computing.
- Napoleon, D., & Pavalakodi, S. (2011). A new method for dimensionality reduction using k-means clustering algorithm for high dimensional data set. International Journal of Computer Applications, January, 13(7), 41-46.
- Liu, M., He, Y., & Hu, H. (2004). Web fuzzy clustering and its applications in web usage mining. Proceedings of 8th International Symposium on Future Software Technology.
- Katariya, N. P., & Chaudhari, M. S. (2015). Bisecting kmeans algorithm for text clustering. International Journal of Advanced Research in Computer Science and Software Engineering, Februrary, 5(2), 221-223.
- Uncu, O., Gruver, W. A., Kotak, D. B., Sabaz, D., Alibhai, Z., & Ng, C. (2006). GRIDBSCAN: Grid densitybased spatial clustering of applications with noise.
- IEEE International Conference on Systems, Man, and Cybernetics, October 8-11, Taipei, Taiwan.
- Han, J., & Kambr, M. (2001). Data Mining: Concepts and Techniques. Hand Book. Beijing: Higher Education Press.
- Thangamani, M., & Thangaraj, P. (2010). Ontology based fuzzy document clustering scheme. Modern Applied Science, July, 4(7), 148-156.
- Jayabharathy, J., Kanmani, S., & Parveen, A. (2011). Document Clustering and Topic Discovery based on Semantic Similarity in Scientific Literature.
- Beil, F., Ester, M., Xu, X. (2002). Frequent term-based text clustering. ACM 1-58113-567-X/02/0007.
- Deng, J., Hu, J. L., Chi, H., & Wu, J. (2010). An improved fuzzy clustering method for text mining.
- nd International Conference on Networks Security, Wireless Communications and Trusted Computing.
- Hamzah, A., Susanto, A., Soesianto, F., & Istyanto, J. E.(2007). Concept based text document clustering.
- Proceedings of International Conference on Electrical Engineering and Informatics, Indonesia June.
- Ji, J., Chan, T. Y. T., & Zhao, Q. (2009). Fast document clustering based on weighted comparative advantage Proceedings of IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October.
Abstract Views: 418
PDF Views: 1