Using Genetic Approach for Learning from Imbalanced Text Corpora

Lincy Mathews; Hari Seetha

The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off

Abstract
References
Article Metrics
Refbacks

Aiming at the ever-present problem of imbalanced data in text classification, the paper employs the Genetic Algorithm approach for tackling the imbalance problem in a binary classed text data. One of the inherent characteristics of imbalanced data is the highly uneven distribution of data among the classes. Consequentially, the traditional classifier algorithms such as the Nearest Neighbor have shown a decreased performance due to the under representation of the interested class. A hybrid approach has been used to incorporate the oversampling technique with the advantages of Genetic Algorithm for generation of the artificial patterns for the minority class. This approach employs avoidance of over fitting as the fitness function to decide the stopping criterion for generation of synthetic samples. Efficient evaluation measures analyze the increase in performance of the proposed hybrid-learning model.

Keywords

Genetic Algorithm, Imbalance Data, Nearest Neighbor, Oversampling, Synthetic Data, Text Data.

About the Journal

Editorial Board

Current Issue

Archives

Advanced Search

Article Submission

Registration

Subscription

User

Information

Journal Content
Browse

Donations

Username
Password
Remember me

Username
Password
Remember me

Indian Journal of Science and Technology

Keywords