Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Ensemble Approaches for Class Imbalance Problem:A Review


Affiliations
1 Department of Information Technology, USICT, GGSIP University, Dwarka, Delhi, India
     

   Subscribe/Renew Journal


In data mining, performing classification for skewed data distribution is a challenging problem. Traditional Classification Techniques (TCT) work efficiently in classifying data having symmetric distribution, as their internal design favors the balanced datasets. The Class Imbalance Problem (CIP) take place when number of instances of one class outnumbers instances of other classes. Some factors that contribute towards this imbalancing are noisy data, borderline samples, degree of class overlapping, small disjuncts, etc. In machine learning, ensembles are basically built to improve the performance and correctness of single classifier by training multiple classifiers to form the results that output the correct single class label. In this paper, our aim is to review ensemble learning methods having two-class problem. We propose different levels for ensemble learning methods that are at data level, at algorithm level and according to the base classifier.

Keywords

Bagging, Boosting, Classification, Class Imbalance Problem, Oversampling, Skewed Data Distribution, Undersampling.
User
Subscription Login to verify subscription
Notifications
Font Size


  • Ensemble Approaches for Class Imbalance Problem:A Review

Abstract Views: 783  |  PDF Views: 0

Authors

Anjana Gosain
Department of Information Technology, USICT, GGSIP University, Dwarka, Delhi, India
Arushi Gupta
Department of Information Technology, USICT, GGSIP University, Dwarka, Delhi, India

Abstract


In data mining, performing classification for skewed data distribution is a challenging problem. Traditional Classification Techniques (TCT) work efficiently in classifying data having symmetric distribution, as their internal design favors the balanced datasets. The Class Imbalance Problem (CIP) take place when number of instances of one class outnumbers instances of other classes. Some factors that contribute towards this imbalancing are noisy data, borderline samples, degree of class overlapping, small disjuncts, etc. In machine learning, ensembles are basically built to improve the performance and correctness of single classifier by training multiple classifiers to form the results that output the correct single class label. In this paper, our aim is to review ensemble learning methods having two-class problem. We propose different levels for ensemble learning methods that are at data level, at algorithm level and according to the base classifier.

Keywords


Bagging, Boosting, Classification, Class Imbalance Problem, Oversampling, Skewed Data Distribution, Undersampling.

References