Automatic Tamil Document Categorization Based on the Naive Bayes Algorithm

S. Kohilavani; T. Mala; T. V. Geetha

Automatic Tamil Document Categorization Based on the Naive Bayes Algorithm

S. Kohilavani , T. Mala , T. V. Geetha

Affiliations
1 Anna University, Chennai, India

Subscribe/Renew Journal

Abstract
References
Article Metrics
Refbacks

This paper deals with automatic classification of tamil documents. Documents are repositories of knowledge. There are numerous documents available and effective search in documents is time consuming. To make document search a simpler task and for various other applications like event detection and tracking, document clustering and grouping we need to perform document categorization. Document categorization is a challenging task. Document categorization has recently become an active research topic in the area of information retrieval. The objective of document categorization is to assign entries from a set of prespecified categories to a document. Traditionally this categorization task is performed manually by domain experts. Each incoming document is read and comprehended by the expert and then it is assigned to a number of categories chosen from the set of prespecified categories. It is inevitable that a large amount of manual effort is required. A promising way to deal with this problem is to learn a categorization scheme automatically from training examples. In the training phase we are given a set of documents with class labels attached, and a classification system is built using a learning method. Once the categorization scheme is learned, it can be used for classifying future documents. Document category can be found out using various techniques. In this paper, Naive Bayes (NB) which is a statistical machine learning algorithm, is used to classify tamil documents to one of pre-defined categories. Experiments are used to evaluate the Naive Bayes categorizer. The data set used during these experiments consists of 50 documents per category. The experimental results shows that the Naive Bayes classifier performs well and its effectiveness is achieved with 89.8% accuracy.

Keywords

Document Categorization, Naive Bayes, Stopwords, Preprocessing, Classifier.

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 436

PDF Views: 7

Automatic Tamil Document Categorization Based on the Naive Bayes Algorithm

Abstract Views: 436 | PDF Views: 7

Authors

S. Kohilavani
Anna University, Chennai, India

T. Mala
Anna University, Chennai, India

T. V. Geetha
Anna University, Chennai, India

Abstract

Keywords

Document Categorization, Naive Bayes, Stopwords, Preprocessing, Classifier.

Username
Password
Remember me

Username
Password
Remember me

Artificial Intelligent Systems and Machine Learning

Artificial Intelligent Systems and Machine Learning

Automatic Tamil Document Categorization Based on the Naive Bayes Algorithm

Subscribe/Renew Journal

Keywords

Automatic Tamil Document Categorization Based on the Naive Bayes Algorithm

Authors

Abstract

Keywords