Open Access Open Access  Restricted Access Subscription Access

A Comparative Evaluation of Machine Learning Based Software Fault Prediction Models Using Principle Component Analysis


Affiliations
1 Research Scholar, Department of Computer Science, GNDU, Amritsar, India
2 Professor, Department of Computer Science, GNDU, Amritsar, India
3 Assitant Professor, Department of Computer Engineering and Technology, GNDU RC, Gurdaspur, India
 

Software fault prediction assists in identifying flaws in the early stages of software development and makes software testing more convenient and reliable. This study investigates the effect of the Principle Component Analysis techniques on software fault prediction models. It empirically compares the performance of six machine learning classifiers: Naïve Bayes, Random Forest, Logistic Regression, Support Vector Machine, Gradient Boosting, and Decision Tree with and without Principle Component Analysis (PCA). Furthermore, this paper aims to measure the capability of the software fault predictability in terms of accuracy, precision, f1 score, and AUC. The Area under Curve (AUC) Receiver Operating Characteristic (ROC) is used to check the validity of the models. The CAMEL dataset is used, which contains twenty-one Object-Oriented Software metrics available on the PROMISE repository. The Comparative evaluation indicates that all classifiers performed well with the Principal Component Analysis technique, whereas Random Forest and Decision Tree outperformed other classifiers.

Keywords

Machine Learning, Fault Prediction, Principle Component Analysis, AUC.
User
Notifications
Font Size

  • Y. J. & A. B. Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, “Defect prediction from static code features: current results, limitations, new approaches.” 2010.
  • T. B. (Author) Glenford J. Myers (Author), Corey Sandler (Author), The Art of Software Testing. .
  • S. K. Pandey, R. B. Mishra, and A. K. Tripathi, “Machine learning based methods for software fault prediction: A survey,” Expert Syst. Appl., vol. 172, no. December 2020, p. 114595, 2021, doi: 10.1016/j.eswa.2021.114595.
  • O. Al Qasem and M. Akour, “Software fault prediction using deep learning algorithms,” Int. J. Open Source Softw. Process., vol. 10, no. 4, pp. 1–19, 2019, doi: 10.4018/IJOSSP.2019100101.
  • S. Goyal, “Effective software defect prediction using support vector machines (SVMs),” International Journal of System Assurance Engineering and Management, vol. 13, no. 2. pp. 681–696, 2022, doi: 10.1007/s13198-021-01326-1.
  • I. Kaur and A. Kaur, “Comparative analysis of software fault prediction using various categories of classifiers,” Int. J. Syst. Assur. Eng. Manag., vol. 12, no. 3, pp. 520–535, 2021, doi: 10.1007/s13198-021-01110-1.
  • N. Babu, Himagiri, V. Vamshi Krishna, A. Anil Kumar, and M. Ravi, “Software defect prediction analysis by using machine learning algorithms.,” Int. J. Recent Technol. Eng., vol. 8, no. 2 Special Issue 11, pp. 3544–3546, 2019, doi: 10.35940/ijrte.B1438.0982S1119.
  • M. Massoudi, N. K. Jain, and P. Bansal, “Software defect prediction using dimensionality reduction and deep learning,” Proc. 3rd Int. Conf. Intell. Commun. Technol. Virtual Mob. Networks, ICICV 2021, no. Icicv, pp. 884–893, 2021, doi: 10.1109/ICICV50876.2021.9388622.
  • K. J. Chabathula, C. D. Jaidhar, and M. A. Ajay Kumara, “Comparative study of Principal Component Analysis based Intrusion Detection approach using machine learning algorithms,” 2015 3rd Int. Conf. Signal Process. Commun. Networking, ICSCN 2015, pp. 1–6, 2015, doi: 10.1109/ICSCN.2015.7219853.
  • A. Singh, R. Bhatia, and A. Sighrova, “Taxonomy of machine learning algorithms in software fault prediction using object oriented metrics,” Procedia Comput. Sci., vol. 132, pp. 993–1001, 2018, doi: 10.1016/j.procs.2018.05.115.
  • A. Kumar and A. Bansal, “Software Fault Proneness Prediction Using Genetic Based Machine Learning Techniques,” Proc. - 2019 4th Int. Conf. Internet Things Smart Innov. Usages, IoT-SIU 2019, pp. 1–5, 2019, doi: 10.1109/IoT-SIU.2019.8777494.
  • C. L. Prabha and N. Shivakumar, “Software Defect Prediction Using Machine Learning Techniques,” Proc. 4th Int. Conf. Trends Electron. Informatics, ICOEI 2020, no. Icoei, pp. 728–733, 2020, doi: 10.1109/ICOEI48184.2020.9142909.
  • G. P. Bhandari and R. Gupta, “Machine learning based software fault prediction utilizing source code metrics,” Proceedings on 2018 IEEE 3rd International Conference on Computing, Communication and Security, ICCCS 2018. pp. 40–45, 2018, doi: 10.1109/CCCS.2018.8586805.
  • M. Maalouf, “Logistic regression in data analysis: An overview,” Int. J. Data Anal. Tech. Strateg., vol. 3, no. 3, pp. 281–299, 2011, doi: 10.1504/IJDATS.2011.041335.
  • S. Pirzada, “Machine Learning and Logistic Regression Umme Salma,” Mach. Learn. Algorithms Logist. Regres., no. May, 2020.
  • L. Connelly, “Logistic regression,” MEDSURG Nursing, vol. 29, no. 5. pp. 353–354, 2020, doi: 10.46692/9781847423399.014.
  • B. Murgante et al., Proceedings - 14th International Conference on Computational Science and Its Applications, ICCSA 2014. 2014.
  • “Decision Tree Classification Algorithm,” JavaTpoint. p. 1, [Online]. Available: https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm.
  • C. Catal, U. Sevim, and B. Diri, “Practical development of an Eclipse-based software fault prediction tool using Naive Bayes algorithm,” Expert Syst. Appl., vol. 38, no. 3, pp. 2347–2353, 2011, doi: 10.1016/j.eswa.2010.08.022.
  • Javatpoint, “Machine Learning Random Forest Algorithm,” Www.Javatpoint.Com. 2021, [Online]. Available: https://www.javatpoint.com/machine-learning-random-forest-algorithm.
  • N. Tarbani, “Gradient Boosting Algorithm | How Gradient Boosting Algorithm Works,” This article was published as a part of the Data Science Blogathon. pp. 1–1, 2021, [Online]. Available: https://www.analyticsvidhya.com/blog/2021/04/how-the-gradient-boosting-algorithm-works/.
  • D. K. Thai, T. M. Tu, T. Q. Bui, and T. T. Bui, “Gradient tree boosting machine learning on predicting the failure modes of the RC panels under impact loads,” Engineering with Computers, vol. 37, no. 1. pp. 597–608, 2021, doi: 10.1007/s00366-019-00842-w.
  • “Importance of Principal Component Analysis | by Mukesh Chaudhary | Medium.” [Online]. Available: https://medium.com/@cmukesh8688/importance-of-principal-component-analysis-e9184a47ffa8.

Abstract Views: 124

PDF Views: 0




  • A Comparative Evaluation of Machine Learning Based Software Fault Prediction Models Using Principle Component Analysis

Abstract Views: 124  |  PDF Views: 0

Authors

Harsimran Kaur
Research Scholar, Department of Computer Science, GNDU, Amritsar, India
Hardeep Singh
Professor, Department of Computer Science, GNDU, Amritsar, India
Amitpal Singh Sohal
Assitant Professor, Department of Computer Engineering and Technology, GNDU RC, Gurdaspur, India

Abstract


Software fault prediction assists in identifying flaws in the early stages of software development and makes software testing more convenient and reliable. This study investigates the effect of the Principle Component Analysis techniques on software fault prediction models. It empirically compares the performance of six machine learning classifiers: Naïve Bayes, Random Forest, Logistic Regression, Support Vector Machine, Gradient Boosting, and Decision Tree with and without Principle Component Analysis (PCA). Furthermore, this paper aims to measure the capability of the software fault predictability in terms of accuracy, precision, f1 score, and AUC. The Area under Curve (AUC) Receiver Operating Characteristic (ROC) is used to check the validity of the models. The CAMEL dataset is used, which contains twenty-one Object-Oriented Software metrics available on the PROMISE repository. The Comparative evaluation indicates that all classifiers performed well with the Principal Component Analysis technique, whereas Random Forest and Decision Tree outperformed other classifiers.

Keywords


Machine Learning, Fault Prediction, Principle Component Analysis, AUC.

References