Predicting Bad Debt Risk in Banks using Machine Learning
Main Article Content
Abstract
This study presents the construction of a credit risk prediction model to improve the effectiveness of risk management at credit institutions. The urgency of the study is underscored by the internal bad-debt ratio of the Vietnamese banking system increasing by nearly 3.4 times by the end of 2023, while the cost of credit risk provisioning rose by 40% compared to 2022. The key challenge is to address a severe data imbalance (bad-debt accounts for 1-5%). Advanced data preprocessing techniques are applied, including handling missing values with the miceforest library and feature selection using Mutual Information combined with Correlation. The key experimental solution is the Mixture of Experts (MoE) Model, using Stratified K-Fold to train experts on 1:1-balanced data. The results show that the MoE model achieves the highest performance with a Recall of 0.87 and an F1-score of 0.79, outperforming the classical Machine Learning models. Applying the model achieves 85-90% forecasting accuracy, optimises the credit process, reduces appraisal time by 25-30%, and supports the sustainable development of the financial system.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
R. Bhandary, B.K. Ghosh. “Credit Card Default Prediction: An Empirical Analysis on Predictive Performance Using Statistical and Machine Learning Methods”. J. Risk Financial Manag. vol 18, 23, (2025), DOI: ttps://doi.org/10.3390/jrfm18010023
V. Charles, T. Gherman, J.C. Paliza. “The Gini Index: A Modern Measure of Inequality”. In Charles, V., Emrouznejad, A. (eds.), Modern Indices for International Economic Diplomacy. Palgrave Macmillan, Cham. (2022), DOI: https://doi.org/10.1007/978-3-030-84535-3_3
D. Elreedy, A.F. Atiya, “A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance”. Information Sciences. Vol 505, pp 32-64,(2019),DOI: https://doi.org/10.1016/j.ins.2019.07.070
B. Krawczyk. “Learning from imbalanced data: open challenges and future directions”. Prog Artif Intell, vol 5, pp. 221–232 (2016), DOI: https://doi.org/10.1007/s13748-016-0094-0
P. Koulafetis. “Modern credit risk management: Theory and practice”. Publisher Palgrave Macmillan London, (2017), DOI: https://doi.org/10.1057/978-1-137-52407-2
M. Mujahid, E. Kına, F. Rustam et al. “Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering”. Journal of Big Data 11, 87 (2024), DOI: https://doi.org/10.1186/s40537-024-00943-4
A. Noriega, A. Rivera, và A. Herrera, “Machine Learning Models for Credit Risk Prediction: A Systematic Review”. Journal of Financial Engineering, 35(1), pp. 76-89, (2023), DOI: https://doi.org/10.3390/data8110169
B. Siddharth, L. Mohan, and Y.R. Reddy. "Machine learning techniques for credit risk evaluation: a systematic literature review". Journal of Banking and Financial Technology 4, no. 1, pp. 111-138, (2020), DOI: https://doi.org/10.1007/s42786-020-00020-3
Z. Sun, W. Ying, W. Zhang, S. Gong. “Undersampling method based on minority class density for imbalanced data”. Expert Systems with Applications, Vol 249, Part A, 123328, (2024), DOI: https://doi.org/10.1016/j.eswa.2024.123328