Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Entropy Based Greedy Unsupervised Feature Selection Method Using Rough Set Theory for Classification


Affiliations
1 Department of Computer Application, North-Eastern Hill University, India
2 Department of Information Technology, Gauhati University, India
     

   Subscribe/Renew Journal


Feature selection technique attempts to select and remove irrelevant features while ensuring that an informative subset of features remains in the dataset. The performance of a classifier often depends on the feature subset used for the robust classification task. In the medical and healthcare application domain, classification accuracy plays a vital role. The higher level of false negatives in medical diagnosis systems may raise the risk of patients not employing the necessary treatment they need. In this article, we have proposed an unsupervised feature selection method that underlines the concepts of rough set theory for the task of classification of high-dimensional datasets. Experiments are carried out on seven public domain healthcare and life science related datasets. The obtained experimental results justify the significance of the proposed method over five other state-of-the-art feature selection methods.

Keywords

Feature Selection, Rough Set, Unsupervised, Entropy
Subscription Login to verify subscription
User
Notifications
Font Size

  • J. Han and M. Kamber, “Data Mining Concepts and Techniques”, Morgan Kaufmann Publishers, 2012.
  • J.P. Cunningham and Z. Ghahramani, “Linear Dimensionality Reduction: Survey, Insights, and Generalizations”, Journal of Machine Learning Research, Vol. 16, No. 1, pp. 2859-2900, 2018.
  • R.K. Bania, “Survey on Feature Selection for Data Reduction”, International Journal of Computer Applications, Vol. 94, No. 18, pp. 1-7, 2014.
  • D. Jain and V. Singh, “Feature Selection and Classification Systems for Chronic Disease Prediction: A Review”, Egyptian Informatics Journal, Vol. 19, No. 3, pp.179-189, 2018.
  • L. Wolf and A. Shashua, “Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach”, Journal of Machine Learning Research, Vol. 6, pp.1855-1887, 2005.
  • X. Yan, B. Gebru and E. Tunstel, “An Efficient Unsupervised Feature Selection Procedure through Feature Clustering”, Pattern Recognition Letters, Vol. 131, pp. 277-284, 2020.
  • X. He and P. Niyogi, “Laplacian Score for Feature Selection”, Proceedings of International Conference on Advances Neural Information Processing, pp. 507-514, 2005.
  • C. Bancioiu and L. Vintan, “Efficiency Optimizations for Koller and Sahami’s Feature Selection Algorithm”, Romanian Journal of Information Science and Technology, Vol. 22, No. 1, pp. 85-99, 2019.
  • L. Sun and X. Cao, “Decision Table Reduction Method Based on New Conditional Entropy for Rough Set Theory”, International Workshop on Intelligent Systems and Applications, Vol. 25, pp.759-768, 2009.
  • J. Liang, C. Dang and Y. Qian, “An Efficient Rough Feature Selection Algorithm with a Multi-Granulation View”, International Journal of Approximate Reasoning, Vol. 53, pp.912-926, 2012.
  • A. Chouchoulas and Q. Shen, “Rough Set-Aided Keyword Reduction for Text Categorization”, Applied Artificial Intelligence: An International Journal, Vol. 15, No. 9, pp. 843-873, 2003.
  • H.H. Inbarani, A.T. Azar and G. Jothi, “Supervised Hybrid Feature Selection based on PSO and Rough Sets for Medical Diagnosis”, Computer Methods and Programs in Biomedicine, Vol. 113, pp. 175-185, 2014.
  • A. Arshaghi, M. Ashourian and L. Ghabeli, “Detection of Skin Cancer Image by Feature Selection Methods Using New Buzzard Optimization (BUZO) Algorithm”, Traitement Du Signal, Vol. 37, No. 2, pp. 181-194, 2020.
  • C. Velayutham and K. Thangavel, “Rough Set based Unsupervised Feature Selection using Relative Dependency Measures”, International Journal of Computational Intelligence and Informatics, Vol. 1, No. 1, pp. 64-69, 2011.
  • K. Thangavel, “Unsupervised Quick Reduct Algorithm using Rough Set Theory”, Journal of Electronic Science and Technology, Vol. 9, No. 3, pp.193-201, 2011.
  • V.B. Canedo and A. Betanzos, “A Review of Feature Selection Methods on Synthetic Data”, Knowledge Information System, Vol. 34, pp. 483-519, 2013.
  • S. Shilaskar and A. Ghatol, “Feature Selection for Medical Diagnosis: Evaluation for Cardiovascular Diseases”, Expert Systems with Applications, Vol. 40, pp. 4146-4153, 2013.
  • P.K.N. Banu and H.H. Inbarani, “Rough Set Based Feature Selection for Egyptian Neonatal Jaundice”, Proceedings of International Conference on Advanced Machine Learning Technologies and Applications, pp. 367-378, 2014.
  • G. Jothi and H. Inbarani, “Soft Set Based Quick Reduct Approach for Unsupervised Feature Selection”, Proceedings of IEEE International Conference on Advanced Communication Control and Computing Technologies, pp. 277-281, 2012.
  • E.S. Shamery and A.R. Al-Obaidi, “A New Approach of Rough Set Theory for Feature Selection and Bayes Net Classifier Applied on Heart Disease Dataset”, Journal of Babylon University Pure and Applied Sciences, Vol. 26, No. 2, pp. 15-26, 2018.
  • Y. Wang and L. Ma, “Feature Selection for Medical Dataset using Rough Set Theory”, Proceedings of IEEE International Conference on Computer Engineering and Applications, pp. 68-72, 2009.
  • R.K. Bania and R. Halder, “R-Ensembler: A Greedy Rough Set based Ensemble Attribute Selection, Algorithm with K-NN Imputation for Classification of Medical Data”, Computer Methods and Programs, in Biomedicine, Vol. 184, pp. 105122-105133, 2020.
  • J. Chen and J. Shao, “Nearest Neighbor Imputation for Survey Data”, Journal of Official Statistics, Vol. 16, No. 2, pp. 113-131, 2000.
  • A. Farhangfar and J. Dy. “Impact of Imputation of Missing Values on Classification Error for Discrete Data”, Pattern Recognition, Vol. 41, pp. 3692-3705, 2008.
  • P. Schmitt, J. Mandel and M. Guedj, “A Comparison of Six Methods for Missing Data Imputation”, Journal of Biometrics and Biostatistics, Vol. 6, No. 1, pp. 1-6, 2015.
  • K.B. Nahato, K.N. Harichandran and K. Arputhara, “Knowledge Mining from Clinical Datasets using Rough Sets and Backpropagation Neural Network”, Proceedings of IEEE International Conference on Computational and Mathematical Methods in Medicine, pp. 1-3, 2015.
  • P. Yildirim, “Filter Based Feature Selection Methods for Prediction of Risks in Hepatitis Disease”, International Journal of Machine Learning and Computing, Vol. 5, No. 4, pp. 258-263, 2015.
  • H. Liu and M. Dash, “Discretization: An Enabling Technique”, Data Mining and Knowledge Discovery, Vol. 6, pp. 393-423, 2002.
  • C.J. Tsai and W.P. Yang, “A Discretization Algorithm based on Class-Attribute Contingency Coefficient”, Information Sciences, Vol. 178, pp.714-731, 2008.
  • Y. Wei, T. Liu and R. Valdez, “Application of Support Vector Machine Modeling for Prediction of Common Diseases: The Case of Diabetes and Pre-Diabetes”, BMC Medical Informatics and Decision Making, Vol. 10, No. 16, 2020.
  • Y. Yang and W. Cai. “Using Random Forest for Reliable Classification and Cost-Sensitive Learning for Medical Diagnosis”, BMC Bioinformatics, Vol. 10, No. 1, pp. 1-14, 2009.
  • Weka Machine Learning Tool, Available at https://www.cs.waikato.ac.nz/ml/ weka.html, Accessed at 2021.
  • C.L. Blake, “UCI Repository of Machine Learning Databases”, Available at https://www.ics.uci.edu/~mlearn, Accessed at 2022.

Abstract Views: 73

PDF Views: 2




  • Entropy Based Greedy Unsupervised Feature Selection Method Using Rough Set Theory for Classification

Abstract Views: 73  |  PDF Views: 2

Authors

Kumar Bania
Department of Computer Application, North-Eastern Hill University, India
Satyajit Sarmah
Department of Information Technology, Gauhati University, India

Abstract


Feature selection technique attempts to select and remove irrelevant features while ensuring that an informative subset of features remains in the dataset. The performance of a classifier often depends on the feature subset used for the robust classification task. In the medical and healthcare application domain, classification accuracy plays a vital role. The higher level of false negatives in medical diagnosis systems may raise the risk of patients not employing the necessary treatment they need. In this article, we have proposed an unsupervised feature selection method that underlines the concepts of rough set theory for the task of classification of high-dimensional datasets. Experiments are carried out on seven public domain healthcare and life science related datasets. The obtained experimental results justify the significance of the proposed method over five other state-of-the-art feature selection methods.

Keywords


Feature Selection, Rough Set, Unsupervised, Entropy

References