Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Feature Sub-Spacing Based Stacking for Effective Imbalance Handling in Sensitive Data


Affiliations
1 Department of Computer Science, Bharathidasan University, India
     

   Subscribe/Renew Journal


Several real world classification applications suffer from an issue called data imbalance. Handling data imbalance is crucial in developing an effective classification system. This work presents an effective classifier ensemble model, Feature Sub-spacing Stacking Model (FSSM) that has been designed to operate on highly imbalanced, complex and sensitive data. The FSSM technique is based on creating subspace of features, to aid in the reduction of data complexity and also to handle data imbalance. First level trains models based on these features, which is followed by creating a stacking architecture. The second level stacking architecture trains on the predictions from the first level base models. This has enabled better and qualitative predictions. Experiments were conducted on bank data and also the NSL-KDD data. Results reveal highly effective performances compared to the existing models.

Keywords

Classification, Data Imbalance, Ensemble, Stacking, Feature Sub-Spacing.
Subscription Login to verify subscription
User
Notifications
Font Size

  • A. Somasundaram and U.S. Reddy, “Data Imbalance: Effects and Solutions for Classification of Large and Highly Imbalanced Data”, Proceedings of International Conference on Research in Engineering, Computers and Technology, pp. 1-16, 2016.
  • A. Somasundaram and U.S. Reddy, “Modelling A Stable Classifier for Handling Large Scale Data with Noise and Imbalance”, Proceedings of International Conference on Computational Intelligence in Data Science, pp. 1-6, 2017.
  • M. Koziarski, B. Krawczyk and M. Woźniak, “Radial-Based Oversampling for Noisy Imbalanced Data Classification”, Neurocomputing, Vol. 343, pp. 19-33, 2019.
  • C. Tsai, W. Lin, Y. Hu and G. Yao, “Under-Sampling Class Imbalanced Datasets by Combining Clustering Analysis and Instance Selection”, Information Sciences, Vol. 477, pp. 47-54, 2019.
  • G. Chen and Z. Ge, “SVM-Tree and SVM-Forest Algorithms for Imbalanced Fault Classification in Industrial Processes”, IFAC Journal of Systems and Control, Vol. 8, pp. 1-8, 2019.
  • M. Buda, A. Maki and M.A. Mazurowski, “A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks”, Neural Networks, Vol. 106, pp. 249-259, 2019.
  • Y. Qian, Y. Liang, M. Li and G. Feng, “A Resampling Ensemble Algorithm for Classification of Imbalance Problems”, Neurocomputing, Vol. 143, pp. 57-67, 2014.
  • N.V. Chawla and K.W. Bowyer, “SMOTE: Synthetic Minority Over-Sampling Technique”, Journal of Artificial Intelligence Research, Vol. 16, pp. 321-357, 2002.
  • X. Tao, Q. Li and W. Guo, “Self-Adaptive Cost Weights-Based Support Vector Machine Cost-Sensitive Ensemble for Imbalanced Data Classification”, Information Sciences, Vol. 487, pp. 1-56, 2019.
  • H. Faris, R. Abukhurma and W. Almanaseer, “Improving Financial Bankruptcy Prediction in a Highly Imbalanced Class Distribution using Oversampling and Ensemble Learning: A Case from the Spanish Market”, Progress in Artificial Intelligence, Vol.9, No. 1, pp. 1-23, 2019.
  • M. Lazaro, F. Herrera and A. Figueiras Vidal, “Ensembles of Cost-Diverse Bayesian Neural Learners for Imbalanced Binary Classification”, Information Sciences, Vol. 520, pp. 31-45, 2020.
  • X. Gao, “An Ensemble Imbalanced Classification Method based on Model Dynamic Selection Driven by Data Partition Hybrid Sampling”, Expert Systems with Applications, Vol. 160, pp. 1-13, 2020.
  • Y. Sui, Y. Wei and D. Zhao, “Computer-Aided Lung Nodule Recognition by SVM Classifier based on Combination of Random under Sampling and Smote”, Computational and Mathematical Methods in Medicine, Vol. 620, pp. 1-13, 2015.
  • Y. Zuo and C.Z. Jia, “Carsite: Identifying Carbonylated Sites of Human Proteins based on a One-Sided Selection Resampling method. Molecular Biosystems”, Vol. 13, No. 11, pp. 2362-2369, 2017.
  • P. Vuttipittayamongkol and E. Elyan, “Neighborhood-Based under Sampling Approach for Handling Imbalanced and Overlapped Data”, Information Sciences, Vol. 509, pp. 47-70, 2020.
  • A. Somasundaram and S. Reddy, “Parallel and Incremental Credit Card Fraud Detection Model to Handle Concept Drift and Data Imbalance”, Neural Computing and Applications, Vol. 31, No. 1, pp. 3-14, 2018.
  • S. Suh, H. Lee, P. Lukowicz and Y. Lee, “CEGAN: Classification Enhancement Generative Adversarial Networks for unraveling data imbalance problems”, Neural Networks, Vol. 133, pp. 69-86, 2021.
  • H. Zhu, G. Liu, M. Zhou, Y. Xie, A. Abusorrah and Q. Kang, “Optimizing Weighted Extreme Learning Machines for Imbalanced Classification and Application to Credit Card Fraud Detection”, Neurocomputing, Vol. 407, pp. 50-62, 2020.
  • Q.Y. Zhu, A.K. Qin, P.N. Suganthan and G.B. Huang, “Evolutionary Extreme Learning Machine”, Pattern Recognition, Vol. 38, No. 10, pp. 1759-1763, 2005.
  • J. Cao, Z. Lin and G. Huang, “Self-Adaptive Evolutionary Extreme Learning Machine”, Neural Processing Letters, Vol. 36, No. 3, pp. 285-305, 2012.
  • Y. Xu and Y. Shu, “Evolutionary Extreme Learning Machine - Based on Particle Swarm Optimization”, Proceedings of International Conference on Advances in Neural Networks, pp. 1-26, 2006.
  • F. Han, H. Yao and Q. Ling, “An Improved Evolutionary Extreme Learning Machine based on Particle Swarm Optimization”, Neurocomputing, Vol. 116, pp. 87-93, 2013.
  • C. Wang, C. Deng and S. Wang, “Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost”, Pattern Recognition Letters, Vol. 136, pp. 190-197, 2020.
  • H. He, W. Zhang and S. Zhang, “A Novel Ensemble Method for Credit Scoring: Adaption of Different Imbalance Ratios”, Expert System Applications, Vol. 98, pp.105-117, 2018.
  • Xu Ying Liu, Jianxin Wu and Zhi Hua Zhou, “Exploratory Undersampling for Class-Imbalance Learning”, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol. 39, No. 2, pp. 539-550, 2009.
  • M.F. Kabir and S. Ludwig, “Classification of Breast Cancer Risk Factors using Several Resampling Approaches”, Proceedings of IEEE International Conference on Machine Learning and Applications, pp. 1243-1248, 2018.

Abstract Views: 185

PDF Views: 1




  • Feature Sub-Spacing Based Stacking for Effective Imbalance Handling in Sensitive Data

Abstract Views: 185  |  PDF Views: 1

Authors

S. Josephine Theresa
Department of Computer Science, Bharathidasan University, India
D. J. Evanjaline
Department of Computer Science, Bharathidasan University, India

Abstract


Several real world classification applications suffer from an issue called data imbalance. Handling data imbalance is crucial in developing an effective classification system. This work presents an effective classifier ensemble model, Feature Sub-spacing Stacking Model (FSSM) that has been designed to operate on highly imbalanced, complex and sensitive data. The FSSM technique is based on creating subspace of features, to aid in the reduction of data complexity and also to handle data imbalance. First level trains models based on these features, which is followed by creating a stacking architecture. The second level stacking architecture trains on the predictions from the first level base models. This has enabled better and qualitative predictions. Experiments were conducted on bank data and also the NSL-KDD data. Results reveal highly effective performances compared to the existing models.

Keywords


Classification, Data Imbalance, Ensemble, Stacking, Feature Sub-Spacing.

References