Identifying Optimal Innovative Machine Learning Models for Predicting U.S. Mortgage Loan Defaults: A Comparative Analysis

Main Article Content

Chi Ton Cong Nguyen
Dr. Victoria N. Dean

Abstract

Mortgage loan defaults pose substantial risks to the financial industry, and accurately predicting these defaults remains a challenge. Traditional credit scoring models often lack accuracy and computational efficiency because they are not data-driven and fail to capture special data patterns that drive borrower default behavior. This limitation becomes especially significant in volatile housing markets, where early detection of defaults can substantially reduce credit losses compared to identifying them at a later stage. Timely and accurate prediction of mortgage loan defaults plays a vital role in formulating effective credit risk management strategies. This research implemented a framework to address this challenge by leveraging cutting-edge Artificial Intelligence / Machine Learning (AI/ML) algorithms and conducting a comparative analysis of model performance to identify the optimal model for predicting mortgage loan defaults. The proposed framework systematically trained and evaluated each model on an extensive dataset comprising over 100,000 loans, featuring a rich set of loan, borrower, and property characteristics from Freddie Mac, as well as macroeconomic factors. Model performance was evaluated using key metrics including accuracy, AUC, F1 scores, and ROC curves. The paper discovered that Extreme Gradient Boosting (XGBoost) was the top performer, offering superior performance and robustness to overfitting, compared to other ML models, including Logistic Regression, Neural Network, Decision Tree, Gradient Boosting, and Random Forest. The results demonstrated that XGBoost achieved the best performance across all evaluation metrics, with 99% accuracy on the training data, 98% on the testing data, and more than 90% for all other metrics. The robust predictive power of XGBoost is mainly due to its ensemble and regularisation techniques, which minimise errors and the overfitting problem simultaneously. These findings contribute a crucial benchmark for mortgage default modeling practice and develop an innovative financial ML technique, XGBoost, for predicting “good” and “bad” loans accurately. Given XGBoost’s exceptional performance in predicting mortgage loan defaults, the study offers an innovative AI solution for credit risk assessment and smart lending decisions for banks, mortgage lenders, and financial institutions in the FinTech industry. The research also highlights a superior AI/ML classification algorithm for any field.

Downloads

Download data is not yet available.

Article Details

Section

Articles

How to Cite

[1]
Chi Ton Cong Nguyen and Dr. Victoria N. Dean , Trans., “Identifying Optimal Innovative Machine Learning Models for Predicting U.S. Mortgage Loan Defaults: A Comparative Analysis”, IJITEE, vol. 14, no. 10, pp. 1–8, Sep. 2025, doi: 10.35940/ijitee.I1127.14100925.
Share |

References

Thomas, L., Crook, J., & Edelman, D. (2017). Credit scoring and its applications. Society for Industrial and Applied Mathematics.

DOI: https://doi.org/10.1137/1.9781611974560

Adelino, M., Schoar, A., & Severino, F. (2018). The Role of Housing and Mortgage Markets in the Financial Crisis. Annual Review of Financial Economics, 10(1), 25–41. DOI: https://doi.org/10.1146/annurev-financial-110217-023036

Shi, S., Tse, R., Luo, W., D’Addona, S., & Pau, G. (2022). Machine learning-driven credit risk: a systemic review. Neural Computing and Applications, 34(17), 14327–14339. DOI: https://doi.org/10.1007/s00521-022-07472-2

Freddie Mac. (2025). Single Family Loan-Level Dataset. Www.freddiemac.com. https://www.freddiemac.com/research/datasets/sf-loanlevel-dataset

Chen, D., Ye, J., & Ye, W. (2023). Interpretable selective learning in credit risk. Research in International Business and Finance, 65(0275-5319), 101940–101940. DOI: https://doi.org/10.1016/j.ribaf.2023.101940

Park, Y. S., Lek, S., & Erik, J. S. (2016). Chapter 7Artificial Neural Networks: Multilayer Perceptron for Ecological Modelling. In Developments in Environmental Modelling (Vol. 28, pp. 123–140). Elsevier. DOI: https://doi.org/10.1016/B978-0-444-63623-2.00007-4

Duan, J. (2019). Financial system modelling using deep neural networks (DNNs) for practical risk assessment and prediction. Journal of the Franklin Institute, 356(8), 4716–4731. DOI: https://doi.org/10.1016/j.jfranklin.2019.01.046

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (2017). Classification And Regression Trees. Routledge.

DOI: https://doi.org/10.1201/9781315139470

Breiman, L., Cutler, A., Liaw, A., Wiener, M., & Liaw, M. A. (2018). Package ‘randomforest’. University of California, Berkeley: Berkeley, CA, USA, 81, 1-29. DOI: https://doi.org/10.32614/CRAN.package.randomForest

Dixon, M. F., Halperin, I., & Bilokon, P. A. (2020). Machine learning in finance: From theory to practice. Springer. DOI: https://doi.org/10.1007/978-3-030-41068-1

Levantesi, S., & Piscopo, G. (2020). The Importance of Economic Variables on the London Real Estate Market: A Random Forest Approach. Risks, 8(4), 112. DOI: https://doi.org/10.3390/risks8040112

Lu, H., Karimireddy, S. P., Ponomareva, N., & Mirrokni, V. (2020). Accelerating gradient boosting machines. In International Conference on Artificial Intelligence and Statistics (pp. 516-526). PMLR, https://proceedings.mlr.press/v108/lu20a.html

Chen, T., & Guestrin, C. (2016). XGBoost: a Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD’16, 1(1), 785–794. DOI: https://doi.org/10.1145/2939672.2939785

Goel, A., & Rastogi, S. (2021). Understanding the Impact of Borrowers' Behavioural and Psychological Traits on Credit Default: A Review and Conceptual Model. Review of Behavioral Finance, 15(2). DOI: https://doi.org/10.1108/rbf-03-2021-0051

Otero González, L., Durán Santomil, P., Lado Sestayo, R., & Vivel Búa, M. (2016). The impact of loan-to-value on the default rate of residential mortgage-backed securities. The Journal of Credit Risk, 12(3). DOI: https://doi.org/10.21314/jcr.2016.210

Ahmad, F., & Shehzad, C. T. (2024). The Role of the Interest Rate Environment in Mortgage Pricing. International Review of Economics & Finance, 89, 225–245. DOI: https://doi.org/10.1016/j.iref.2023.07.102

Li, Y., & Shi, Y. (2024). Credit Evaluation System Based on FICO. Applied and Computational Engineering, 96(1), 48–55.

DOI: https://doi.org/10.54254/2755-2721/2024.17854

Most read articles by the same author(s)

1 2 3 4 5 6 7 8 9 10 > >>