Open Access Open Access  Restricted Access Subscription Access

Text Document Clustering and Classification using K-Means Algorithm and Neural Networks


Affiliations
1 Department of CSE, Chandigarh University, Gharuan, Mohali - 140413, Punjab, India
 

This paper demonstrated the outcomes of the research of a number of general document clustering and classification methods. Objectives: This research improves the clustering. Its objective is to create a system which reduces the retrieval time of text documents from clusters. Method: In this paper, we propose a new method supporting clustering and classification, using k-means with feed forward neural networks using MATLAB. We use k-mean for the clustering of text documents and neural networks for classification of text documents. Findings: Earlier various techniques have come up like semi supervised models for labelled text, namely Partially Labeled Dirichlet Allocation and the Partially Labeled Dirichlet Process, genetic algorithm, Guassian distribution, hybrid genetic algorithm, fast k means global, k-means clustering. But all these techniques have their merits as well as demerits and the common thing is that these techniques are very time consuming. That is why the main aim of the work is to develop the model based on supervised as well as unsupervised techniques to achieve the similarity between documents. Improvements: To remove that time consuming problem we used neural networks for classification and k-means for clustering. We developed a model based on supervised as well as unsupervised technique to achieve the similarity between documents.

Keywords

Artificial Neural Network, Cosine Similarity and Data Mining, K-mean Algorithm, Similarity Measure Function, Text Document Clustering.
User

Abstract Views: 150

PDF Views: 0




  • Text Document Clustering and Classification using K-Means Algorithm and Neural Networks

Abstract Views: 150  |  PDF Views: 0

Authors

Ramanpreet Kaur
Department of CSE, Chandigarh University, Gharuan, Mohali - 140413, Punjab, India
Amandeep Kaur
Department of CSE, Chandigarh University, Gharuan, Mohali - 140413, Punjab, India

Abstract


This paper demonstrated the outcomes of the research of a number of general document clustering and classification methods. Objectives: This research improves the clustering. Its objective is to create a system which reduces the retrieval time of text documents from clusters. Method: In this paper, we propose a new method supporting clustering and classification, using k-means with feed forward neural networks using MATLAB. We use k-mean for the clustering of text documents and neural networks for classification of text documents. Findings: Earlier various techniques have come up like semi supervised models for labelled text, namely Partially Labeled Dirichlet Allocation and the Partially Labeled Dirichlet Process, genetic algorithm, Guassian distribution, hybrid genetic algorithm, fast k means global, k-means clustering. But all these techniques have their merits as well as demerits and the common thing is that these techniques are very time consuming. That is why the main aim of the work is to develop the model based on supervised as well as unsupervised techniques to achieve the similarity between documents. Improvements: To remove that time consuming problem we used neural networks for classification and k-means for clustering. We developed a model based on supervised as well as unsupervised technique to achieve the similarity between documents.

Keywords


Artificial Neural Network, Cosine Similarity and Data Mining, K-mean Algorithm, Similarity Measure Function, Text Document Clustering.



DOI: https://doi.org/10.17485/ijst%2F2016%2Fv9i40%2F126233