Open Access Open Access  Restricted Access Subscription Access

Extrapolation of Loan Default using Predictive Analytics: A Case of Business Analysis


Affiliations
1 City University College of Ajman, Ajman, United Arab Emirates
 

The research assesses the validity of a customer's appropriateness for a loan using a machine learning approach called predictive modeling. Banks and Non-Banking Financial Companies (NBFCs) are at danger of significant Non-Performing Assets (NPAs) due to customer non-payment of loans (Non-Performing Assets). The data for this study came from Kaggle, and eight different prediction models were employed to determine if the borrower would be able to repay the loan. Adaboost, κ-Nearest Neighbors (k-NN), Logistic Regression, Support Vector Machines (SVM), Decision Tree, Naive Bayes, Neural Networks, and Random Forest (RF) are the eight models, respectively. The purpose is to back up decisions made on the basis of factual evidence rather than subjective reasons. Classification Accuracy, Precision, Recall, and F-1 scores are the four performance parameters used to determine the results. With 70% and 30% respectively, the dataset is separated into train and test datasets. The whole analysis is done in two phases, with the first being a full model that is trained on 70% of the train data and the second being observed on 30% of the test data. The purpose of this study is to see how objective characteristics influence borrowers to default on loans, to identify the most common reasons for default, and to predict which customers would default. There are two evaluations we did for the research, wherein, first we took overall train set and make predictions using predictive modeling. The Adaboost predictive model delivers the greatest results, with a recall rate of 0.384, classification accuracy of 59.2 percent, true-positive rate of 69.74 percent. Second, we performed feature selection and discovered that Credit History with 31 percent had the utmost impact on loan default detection. By partitioning the dataset into Credit_History 1 and 0, we discovered that Credit History 1 produces superior results, with a rate of 0.444, 60.5 percent classification accuracy, and a true-positive rate of 68.7%.

Keywords

Adaboost, Decision Tree, κ-nearest Neighbors (κ-NN), Logistic Regression, Naïve Bayes, Neural Network, Non-Banking Financial Companies (NBFC), Support Vector Machine (SVM), Random Forest.
User
Notifications
Font Size

  • Alojail, M., & Bhatia, S. (2020). A Novel Technique for Behavioral Analytics Using Ensemble Learning Algorithms in E-Commerce. IEEE Access, 8, 150072–150080. https:// doi.org/10.1109/ACCESS.2020.3016419
  • Al-qerem, A., Al-Naymat, G., & Alhasan, M. (2019). Loan Default Prediction Model Improvement through Comprehensive Preprocessing and Features Selection. 2019 International Arab Conference on Information Technology (ACIT), 235– 240. https://doi.org/10.1109/ACIT47987.2019.8991084
  • Arutjothi, G., & Senthamarai, C. (2017). Prediction of loan status in commercial bank using machine learning classi- fier. 2017 International Conference on Intelligent Sustainable Systems (ICISS), 416–419. https://doi.org/10.1109/ ISS1.2017.8389442
  • Blöchlinger, A., & Leippold, M. (2006). Economic benefit of powerful credit scoring. Journal of Banking & Finance, 30(3), 851–873. https://doi.org/10.1016/j.jbankfin.2005.07.014
  • Chopra, Y., Subramanian, K., & Tantri, P. L. (2020). Bank Cleanups, Capitalization, and Lending: Evidence from India. The Review of Financial Studies, hhaa119. https://doi. org/10.1093/rfs/hhaa119
  • Dastile, X., Celik, T., & Potsane, M. (2020). Statistical and machine learning models in credit scoring: A systematic lit- erature survey. Applied Soft Computing, 91, 106263. https:// doi.org/10.1016/j.asoc.2020.106263
  • Einav, L., Jenkins, M., & Levin, J. (2013). The impact of credit scoring on consumer lending. The RAND Journal of Economics, 44(2), 249–274. https://doi.org/10.1111/1756- 2171.12019
  • Ghosh, S. (2021). Wilful defaults took a turn for the worse in Apr-Dec amid pandemic. Mint. https://www.livemint.com/ industry/banking/wilful-defaults-took-a-turn-for-the- worse-in-apr-dec-amid-pandemic-11619030170683.html
  • Hassan, A. K. I., & Abraham, A. (2013). Modeling con- sumer loan default prediction using neural netware. 2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE), 239–243. https://doi.org/10.1109/ ICCEEE.2013.6633940
  • Jia, H. (2018, April 10). Credit Scoring with Machine Learning. Medium. https://medium.com/henry-jia/how-to-score- your-credit-1c08dd73e2ed
  • Krichene, A. (2017). Using a naive Bayesian classifier meth- odology for loan risk assessment:Evidence from a Tunisian commercial bank. Journal of Economics, Finance and Administrative Science, 22(42), 3–24. https://doi. org/10.1108/JEFAS-02-2017-0039
  • Microsoft. (2020). What is the Team Data Science Process? https://docs.microsoft.com/en-us/azure/machine-learn- ing/team-data-science-process/overview
  • Moneycontrol. (2020). How Machine Learning Is Reducing Loan Defaults And Easing Debt Recovery. Moneycontrol. https://www.moneycontrol.com/news/technology/ fintech-how-machine-learning-is-reducing-loan-defaults- and-easing-debt-recovery-4798461.html
  • Press, G. (2016). Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Forbes. https://www.forbes.com/sites/gilpress/2016/03/23/ data-preparation-most-time-consuming-least-enjoy- able-data-science-task-survey-says/
  • PTI. (2021). HDFC, ICICI Bank, SBI, among top-10 lenders in 2020; Google Pay, PhonePe top wallets: Report - Times of India. The Times of India. https://timesofindia.indiatimes. com/business/india-business/hdfc-icici-bank-sbi-among- top-10-lenders-in-2020-google-pay-phonepe-top-wallets- report/articleshow/79844080.cms
  • RBI. (2021). Need list of top the 10 Banks with lowest NPA. https://tradingqna.com/t/need-list-of-top-the-10-banks- with-lowest-npa/100231
  • Reddy, M. V. J., & Kavitha, B. (2010). Neural Networks for Prediction of Loan Default Using Attribute Relevance Analysis. 2010 International Conference on Signal Acquisition and Processing, 274–277. https://doi. org/10.1109/ICSAP.2010.10
  • Redman, T. C. (2018, April 2). If Your Data Is Bad, Your Machine Learning Tools Are Useless. Harvard Business Review. https://hbr.org/2018/04/if-your-data-is-bad-your- machine-learning-tools-are-useless
  • Shoumo, S. Z. H., Dhruba, M. I. M., Hossain, S., Ghani, N. H., Arif, H., & Islam, S. (2019). Application of Machine Learning in Credit Risk Assessment: A Prelude to Smart Banking. TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON), 2023–2028. https://doi.org/10.1109/ TENCON.2019.8929527
  • Shukla, S. (2021). Payment defaults rise 50% in May for NBFCs—The Economic Times. https://economic- times.indiatimes.com/industr y/banking/finance/ payment-defaults-rise-50-in-may-for-nbfcs/article- show/82725399.cms?from=mdr
  • Statista. (2021). India: Gross non-performing loan ratio 2021. Statista. https://www.statista.com/statistics/1013267/ non-performing-loan-ratio-scheduled-commercial-banks- india/
  • Sunitha, T., M, C., M, R., G, S. sri, T. V.s., J., & A, T. (2021). Predicting the Loan Status using Logistic Regression and Binary Tree (SSRN Scholarly Paper ID 3769854). Social Science Research Network. https://doi.org/10.2139/ ssrn.3769854
  • Wu, M., Huang, Y., & Duan, J. (2019). Investigations on Classification Methods for Loan Application Based on Machine Learning. 2019 International Conference on Machine Learning and Cybernetics (ICMLC), 1–6. https:// doi.org/10.1109/ICMLC48188.2019.8949252
  • Zhao, S. (2021, March 8). Predicting Loan Defaults Using Logistic Regression. Medium. https://selenaezhao.medium. com/predicting-loan-defaults-using-logistic-regression- 71b7482a8cf7
  • Zhu, L., Qiu, D., Ergu, D., Ying, C., & Liu, K. (2019). A study on predicting loan default based on the random forest algo- rithm. Procedia Computer Science, 162, 503–513. https:// doi.org/10.1016/j.procs.2019.12.017

Abstract Views: 139

PDF Views: 88




  • Extrapolation of Loan Default using Predictive Analytics: A Case of Business Analysis

Abstract Views: 139  |  PDF Views: 88

Authors

Riktesh Srivastava
City University College of Ajman, Ajman, United Arab Emirates

Abstract


The research assesses the validity of a customer's appropriateness for a loan using a machine learning approach called predictive modeling. Banks and Non-Banking Financial Companies (NBFCs) are at danger of significant Non-Performing Assets (NPAs) due to customer non-payment of loans (Non-Performing Assets). The data for this study came from Kaggle, and eight different prediction models were employed to determine if the borrower would be able to repay the loan. Adaboost, κ-Nearest Neighbors (k-NN), Logistic Regression, Support Vector Machines (SVM), Decision Tree, Naive Bayes, Neural Networks, and Random Forest (RF) are the eight models, respectively. The purpose is to back up decisions made on the basis of factual evidence rather than subjective reasons. Classification Accuracy, Precision, Recall, and F-1 scores are the four performance parameters used to determine the results. With 70% and 30% respectively, the dataset is separated into train and test datasets. The whole analysis is done in two phases, with the first being a full model that is trained on 70% of the train data and the second being observed on 30% of the test data. The purpose of this study is to see how objective characteristics influence borrowers to default on loans, to identify the most common reasons for default, and to predict which customers would default. There are two evaluations we did for the research, wherein, first we took overall train set and make predictions using predictive modeling. The Adaboost predictive model delivers the greatest results, with a recall rate of 0.384, classification accuracy of 59.2 percent, true-positive rate of 69.74 percent. Second, we performed feature selection and discovered that Credit History with 31 percent had the utmost impact on loan default detection. By partitioning the dataset into Credit_History 1 and 0, we discovered that Credit History 1 produces superior results, with a rate of 0.444, 60.5 percent classification accuracy, and a true-positive rate of 68.7%.

Keywords


Adaboost, Decision Tree, κ-nearest Neighbors (κ-NN), Logistic Regression, Naïve Bayes, Neural Network, Non-Banking Financial Companies (NBFC), Support Vector Machine (SVM), Random Forest.

References





DOI: https://doi.org/10.53739/samvad%2F2021%2Fv23%2F166261