Open Access Open Access  Restricted Access Subscription Access

Evaluation of Cost Sensitive Learning for Imbalanced Bank Direct Marketing Data


Affiliations
1 Faculty of Computing and Informatics, Multimedia University, 63100, Cyberjaya, Selangor, Malaysia
 

Objectives: The imbalanced bank direct marketing data set utilized in this study is a two-class data mining problem, where a customer may or may not subscribe a product from a bank. Methods/Statistical Analysis: The data set inherited the rare class problem where the classification rate attained for the rare class is low. In this study, we attempted cost sensitive learning to mitigate the problem, and to address that there are various costs involved when misclassification occurs. Three learning algorithms, namely, Naive Bayes (NB), C4.5 and Naive Bayes Tree (NBT) were involved in the cost sensitive learning and their results were empirically evaluated. Findings: The results were also compared with two previous studies that utilized the cost insensitive SVM and over-sampling, respectively. Although cost sensitive learning is claimed able to handle imbalanced data sets, but we noticed that the learning is less effective for the bank direct marketing data set in overall. Cost sensitive learning provides a way of “wrapping” learning algorithms that are not designed to handle imbalanced class distributions. Therefore, it may not work well for certain imbalanced data sets. Over-sampling, on the other hand, worked well for the data set. Improvements/Applications: Over-sampling helped to generalize the decision region of the rare class clearly and subsequently improved the classification result.

Keywords

Bank Direct Marketing, Cost Sensitive Learning, Imbalanced Data Set, Rare Class Problem, Over-Sampling.
User

Abstract Views: 219

PDF Views: 0




  • Evaluation of Cost Sensitive Learning for Imbalanced Bank Direct Marketing Data

Abstract Views: 219  |  PDF Views: 0

Authors

Khor Kok-Chin
Faculty of Computing and Informatics, Multimedia University, 63100, Cyberjaya, Selangor, Malaysia
Ng Keng-Hoong
Faculty of Computing and Informatics, Multimedia University, 63100, Cyberjaya, Selangor, Malaysia

Abstract


Objectives: The imbalanced bank direct marketing data set utilized in this study is a two-class data mining problem, where a customer may or may not subscribe a product from a bank. Methods/Statistical Analysis: The data set inherited the rare class problem where the classification rate attained for the rare class is low. In this study, we attempted cost sensitive learning to mitigate the problem, and to address that there are various costs involved when misclassification occurs. Three learning algorithms, namely, Naive Bayes (NB), C4.5 and Naive Bayes Tree (NBT) were involved in the cost sensitive learning and their results were empirically evaluated. Findings: The results were also compared with two previous studies that utilized the cost insensitive SVM and over-sampling, respectively. Although cost sensitive learning is claimed able to handle imbalanced data sets, but we noticed that the learning is less effective for the bank direct marketing data set in overall. Cost sensitive learning provides a way of “wrapping” learning algorithms that are not designed to handle imbalanced class distributions. Therefore, it may not work well for certain imbalanced data sets. Over-sampling, on the other hand, worked well for the data set. Improvements/Applications: Over-sampling helped to generalize the decision region of the rare class clearly and subsequently improved the classification result.

Keywords


Bank Direct Marketing, Cost Sensitive Learning, Imbalanced Data Set, Rare Class Problem, Over-Sampling.



DOI: https://doi.org/10.17485/ijst%2F2016%2Fv9i42%2F123949