Automatic Induction of Rule Based Text Categorization

D. Maghesh Kumar

Automatic Induction of Rule Based Text Categorization

D. Maghesh Kumar

Affiliations
1 Department of Information Technology, AVC Polytechnic College, Mayiladuthurai, India

Abstract
References
Article Metrics
Refbacks

The automated categorization of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. This paper describes, a novel method for the automatic induction of rule-based text classifiers. This method supports a hypothesis language of the form "if T₁, … or T_n occurs in document d, and none of T_1+n,... T_n+m occurs in d, then classify d under category c," where each Ti is a conjunction of terms. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. Issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation were discussed in detail.