Large Document Set Clustering:An Integrated Approach

Krishna Kumar Mohbey; G. S. Thakur

Large Document Set Clustering:An Integrated Approach

Krishna Kumar Mohbey , G. S. Thakur

Affiliations
1 National Institute of Technology, Bhopal, India

Subscribe/Renew Journal

Document clustering is an important mining task used by the different peoples for different kind of purposes. It is generally used to find the similar document from the large amount of documents. The document set may be the collection of blogs, website access patterns, or any transaction files. By the document clustering one can find out the similar kind of habits of different peoples, which can play large role in future trend analysis and taking some decisions. Most of the clustering methods uses distance calculation for similarity measure. They scans document multiple times for knowing class and then prepare cluster. If the documents are large then these methods takes more time for clustering. We propose an advanced environment for document clustering, in which only one time documents are scan and immediately assign into the appropriate cluster. Experiments are conducted with the 20 news group datasets by the MATLAB software. Experimental results show the effectiveness of the proposed environment for large document sets.

Keywords

Document Clustering, Similarity Measurements, Dendogram, Term Extraction.

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

Data Mining and Knowledge Engineering

Large Document Set Clustering:An Integrated Approach

Subscribe/Renew Journal

Keywords

Large Document Set Clustering:An Integrated Approach

Authors

Abstract

Keywords

Username
Password
Remember me

Username
Password
Remember me