Open Access Open Access  Restricted Access Subscription Access

Using Genetic Approach for Learning from Imbalanced Text Corpora


Affiliations
1 School of Information Technology and Engineering, VIT University, Vellore - 632014, Tamil Nadu, India
2 SCOPE, Vellore Institute of Technology University, Vellore - 632014, Tamil Nadu, India
 

Aiming at the ever-present problem of imbalanced data in text classification, the paper employs the Genetic Algorithm approach for tackling the imbalance problem in a binary classed text data. One of the inherent characteristics of imbalanced data is the highly uneven distribution of data among the classes. Consequentially, the traditional classifier algorithms such as the Nearest Neighbor have shown a decreased performance due to the under representation of the interested class. A hybrid approach has been used to incorporate the oversampling technique with the advantages of Genetic Algorithm for generation of the artificial patterns for the minority class. This approach employs avoidance of over fitting as the fitness function to decide the stopping criterion for generation of synthetic samples. Efficient evaluation measures analyze the increase in performance of the proposed hybrid-learning model.

Keywords

Genetic Algorithm, Imbalance Data, Nearest Neighbor, Oversampling, Synthetic Data, Text Data.
User

Abstract Views: 179

PDF Views: 0




  • Using Genetic Approach for Learning from Imbalanced Text Corpora

Abstract Views: 179  |  PDF Views: 0

Authors

Lincy Mathews
School of Information Technology and Engineering, VIT University, Vellore - 632014, Tamil Nadu, India
Hari Seetha
SCOPE, Vellore Institute of Technology University, Vellore - 632014, Tamil Nadu, India

Abstract


Aiming at the ever-present problem of imbalanced data in text classification, the paper employs the Genetic Algorithm approach for tackling the imbalance problem in a binary classed text data. One of the inherent characteristics of imbalanced data is the highly uneven distribution of data among the classes. Consequentially, the traditional classifier algorithms such as the Nearest Neighbor have shown a decreased performance due to the under representation of the interested class. A hybrid approach has been used to incorporate the oversampling technique with the advantages of Genetic Algorithm for generation of the artificial patterns for the minority class. This approach employs avoidance of over fitting as the fitness function to decide the stopping criterion for generation of synthetic samples. Efficient evaluation measures analyze the increase in performance of the proposed hybrid-learning model.

Keywords


Genetic Algorithm, Imbalance Data, Nearest Neighbor, Oversampling, Synthetic Data, Text Data.



DOI: https://doi.org/10.17485/ijst%2F2016%2Fv9i48%2F140320