The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off


Objectives: To address a pair wise text comparison of large dataset while making use of cosine similarity metric and adjacent method and to develop a model for parallel processing of giant data while using distributed algorithms on parallel clusters. Methods/Statistical Analysis: This works makes use of K-means algorithm based on map-reduce on document files with effective number of clusters in a Java environment. This work reflects an approach to classify text documents using feature selection method makes use of cosine similarity method. Within fixed number of iterations, efficient numbers of clusters have been implemented. The implementation has been carried out in Java environment. Findings: The proposed work reflects an approach to classify text documents using feature selection method. Application/Improvements: While using cosine similarity methods, the results retrieved are quite improved and acceptable.

Keywords

Cosine Similarity, Document Files, Text Clustering
User