Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

A Framework for Class Imbalance Problem Using Hybrid Sampling


Affiliations
1 University of Kashmir, India
2 Dept. of Computer Science, University of Kashmir, India
     

   Subscribe/Renew Journal


The skewness in underlying data distribution is natural in most of the datasets that are generated in real world applications and such datasets are commonly known as class imbalanced datasets. The examples of one class are very less in number than the examples in other class(es).The multi-class classification with imbalanced datasets has grabbed much attention  by data mining and machine learning research communities in recent years. The main aim of this paper is to ameliorate the classification performance of minority class without reducing the classification performance of majority class(es).The problem is studied as one while as usually the researchers have studied it as two individual problems multi-class and imbalance problem.

This paper addresses the problem by devising a novel framework based on data solution (Random Hybrid Sampling) and well known binarization algorithm (OVO-Binarization).Eventually performance improvement of our frame work is shown using several performance measures such as Precision,Recall,F1-score and G-Mean on benchmark data-sets imported from UCI machine learning repository.


Keywords

Class Imbalance, Classification, Over-Sampling, Under-Sampling, OVO-Binarization, Performance Metrics.
User
Subscription Login to verify subscription
Notifications
Font Size


  • A Framework for Class Imbalance Problem Using Hybrid Sampling

Abstract Views: 927  |  PDF Views: 6

Authors

S. Jahangeer Sidiq
University of Kashmir, India
Majid Zaman
University of Kashmir, India
Muheet Butt
Dept. of Computer Science, University of Kashmir, India

Abstract


The skewness in underlying data distribution is natural in most of the datasets that are generated in real world applications and such datasets are commonly known as class imbalanced datasets. The examples of one class are very less in number than the examples in other class(es).The multi-class classification with imbalanced datasets has grabbed much attention  by data mining and machine learning research communities in recent years. The main aim of this paper is to ameliorate the classification performance of minority class without reducing the classification performance of majority class(es).The problem is studied as one while as usually the researchers have studied it as two individual problems multi-class and imbalance problem.

This paper addresses the problem by devising a novel framework based on data solution (Random Hybrid Sampling) and well known binarization algorithm (OVO-Binarization).Eventually performance improvement of our frame work is shown using several performance measures such as Precision,Recall,F1-score and G-Mean on benchmark data-sets imported from UCI machine learning repository.


Keywords


Class Imbalance, Classification, Over-Sampling, Under-Sampling, OVO-Binarization, Performance Metrics.

References