Open Access Open Access  Restricted Access Subscription Access

Automatic Induction of Rule Based Text Categorization


Affiliations
1 Department of Information Technology, AVC Polytechnic College, Mayiladuthurai, India
 

The automated categorization of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. This paper describes, a novel method for the automatic induction of rule-based text classifiers. This method supports a hypothesis language of the form "if T1, … or Tn occurs in document d, and none of T1+n,... Tn+m occurs in d, then classify d under category c," where each Ti is a conjunction of terms. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. Issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation were discussed in detail.

Keywords

Data Mining, Text Mining, Clustering, Classification, and Association Rules, Mining Methods and Algorithms.
User
Notifications
Font Size

Abstract Views: 378

PDF Views: 161




  • Automatic Induction of Rule Based Text Categorization

Abstract Views: 378  |  PDF Views: 161

Authors

D. Maghesh Kumar
Department of Information Technology, AVC Polytechnic College, Mayiladuthurai, India

Abstract


The automated categorization of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. This paper describes, a novel method for the automatic induction of rule-based text classifiers. This method supports a hypothesis language of the form "if T1, … or Tn occurs in document d, and none of T1+n,... Tn+m occurs in d, then classify d under category c," where each Ti is a conjunction of terms. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. Issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation were discussed in detail.

Keywords


Data Mining, Text Mining, Clustering, Classification, and Association Rules, Mining Methods and Algorithms.