Cosine Similarity with Centroid Implication for Text Clustering of Document Files

Anubhuti Singh; Chetna Dabas; J. P. Gupta

doi:10.17485/ijst/2016/v9i48/139835

Cosine Similarity with Centroid Implication for Text Clustering of Document Files

Anubhuti Singh , Chetna Dabas , J. P. Gupta

Affiliations
1 Jaypee Institute of Information and Technology, Noida - 201301, Uttar Pradesh, India

Abstract
References
Article Metrics
Refbacks

Objectives: To address a pair wise text comparison of large dataset while making use of cosine similarity metric and adjacent method and to develop a model for parallel processing of giant data while using distributed algorithms on parallel clusters. Methods/Statistical Analysis: This works makes use of K-means algorithm based on map-reduce on document files with effective number of clusters in a Java environment. This work reflects an approach to classify text documents using feature selection method makes use of cosine similarity method. Within fixed number of iterations, efficient numbers of clusters have been implemented. The implementation has been carried out in Java environment. Findings: The proposed work reflects an approach to classify text documents using feature selection method. Application/Improvements: While using cosine similarity methods, the results retrieved are quite improved and acceptable.

Keywords

Cosine Similarity, Document Files, Text Clustering

About the Journal

Editorial Board

Current Issue

Archives

Advanced Search

Article Submission

Registration

Subscription

User

Information

Journal Content
Browse

Donations

Abstract Views: 148

PDF Views: 0

Cosine Similarity with Centroid Implication for Text Clustering of Document Files

Abstract Views: 148 | PDF Views: 0

Authors

Anubhuti Singh
Jaypee Institute of Information and Technology, Noida - 201301, Uttar Pradesh, India

Chetna Dabas
Jaypee Institute of Information and Technology, Noida - 201301, Uttar Pradesh, India

J. P. Gupta
Jaypee Institute of Information and Technology, Noida - 201301, Uttar Pradesh, India

Abstract

Keywords

Cosine Similarity, Document Files, Text Clustering

DOI: https://doi.org/10.17485/ijst%2F2016%2Fv9i48%2F139835

Username
Password
Remember me

Username
Password
Remember me

Indian Journal of Science and Technology

Cosine Similarity with Centroid Implication for Text Clustering of Document Files

Keywords

Cosine Similarity with Centroid Implication for Text Clustering of Document Files

Authors

Abstract

Keywords