Open Access
Subscription Access
Open Access
Subscription Access
Large Document Set Clustering:An Integrated Approach
Subscribe/Renew Journal
Document clustering is an important mining task used by the different peoples for different kind of purposes. It is generally used to find the similar document from the large amount of documents. The document set may be the collection of blogs, website access patterns, or any transaction files. By the document clustering one can find out the similar kind of habits of different peoples, which can play large role in future trend analysis and taking some decisions. Most of the clustering methods uses distance calculation for similarity measure. They scans document multiple times for knowing class and then prepare cluster. If the documents are large then these methods takes more time for clustering. We propose an advanced environment for document clustering, in which only one time documents are scan and immediately assign into the appropriate cluster. Experiments are conducted with the 20 news group datasets by the MATLAB software. Experimental results show the effectiveness of the proposed environment for large document sets.
Keywords
Document Clustering, Similarity Measurements, Dendogram, Term Extraction.
User
Subscription
Login to verify subscription
Font Size
Information
Abstract Views: 209
PDF Views: 2