Open Access Open Access  Restricted Access Subscription Access

A Novel Credit Scoring Prediction Model based on Feature Selection Approach and Parallel Random Forest


Affiliations
1 Department of Economic Information System, Academy of Finance, Hanoi, Viet Nam
2 Department of Information Technology, VNU-University of Engineering and Technology, Hanoi, Viet Nam
3 Department, Faculty of Telecommunications, Posts and Telecommunications Institute of Technology, Hanoi, Viet Nam
 

Background/Objectives: This article presents a method of feature selection to improve the accuracy and the computation speed of credit scoring models. Methods/Analysis: In this paper, we proposed a credit scoring model based on parallel Random Forest classifier and feature selection method to evaluate the credit risks of applicants. By integration of Random Forest into feature selection process, the importance of features can be accurately evaluated to remove irrelevant and redundant features. Findings: In this research, an algorithm to select best features was developed by using the best average and median scores and the lowest standard deviation as the rules of feature scoring. Consequently, the dimension of features can be reduced to the smallest possible number that allows of a remarkable runtime reduction. Thus the proposed model can perform feature selection and model parameters optimization at the same time to improve its efficiency. The performance of our proposed model was experimentally assessed using two public datasets which are Australian and German datasets. The obtained results showed that an improved accuracy of the proposed model compared to other commonly used feature selection methods. In particular, our method can attain the average accuracy of 76.2% with a significantly reduced running time of 72 minutes on German credit dataset and the highest average accuracy of 89.4% with the running time of only 50 minutes on Australian credit dataset. Applications/Improvements: This method can be usefully applied in credit scoring models to improve accuracy with a significantly reduced runtime.

Keywords

Credit Scoring, Feature Selection, Machine Learning, and Parallel Random Forest.
User

Abstract Views: 195

PDF Views: 0




  • A Novel Credit Scoring Prediction Model based on Feature Selection Approach and Parallel Random Forest

Abstract Views: 195  |  PDF Views: 0

Authors

Ha Van Sang
Department of Economic Information System, Academy of Finance, Hanoi, Viet Nam
Nguyen Ha Nam
Department of Information Technology, VNU-University of Engineering and Technology, Hanoi, Viet Nam
Nguyen Duc Nhan
Department, Faculty of Telecommunications, Posts and Telecommunications Institute of Technology, Hanoi, Viet Nam

Abstract


Background/Objectives: This article presents a method of feature selection to improve the accuracy and the computation speed of credit scoring models. Methods/Analysis: In this paper, we proposed a credit scoring model based on parallel Random Forest classifier and feature selection method to evaluate the credit risks of applicants. By integration of Random Forest into feature selection process, the importance of features can be accurately evaluated to remove irrelevant and redundant features. Findings: In this research, an algorithm to select best features was developed by using the best average and median scores and the lowest standard deviation as the rules of feature scoring. Consequently, the dimension of features can be reduced to the smallest possible number that allows of a remarkable runtime reduction. Thus the proposed model can perform feature selection and model parameters optimization at the same time to improve its efficiency. The performance of our proposed model was experimentally assessed using two public datasets which are Australian and German datasets. The obtained results showed that an improved accuracy of the proposed model compared to other commonly used feature selection methods. In particular, our method can attain the average accuracy of 76.2% with a significantly reduced running time of 72 minutes on German credit dataset and the highest average accuracy of 89.4% with the running time of only 50 minutes on Australian credit dataset. Applications/Improvements: This method can be usefully applied in credit scoring models to improve accuracy with a significantly reduced runtime.

Keywords


Credit Scoring, Feature Selection, Machine Learning, and Parallel Random Forest.



DOI: https://doi.org/10.17485/ijst%2F2016%2Fv9i20%2F133289