Application of Machine Learning Models for Patients Health Insurance Cost Prediction

Main Article Content

Dr. Annwesha Banerjee Majumder
Dr. Sumit Das
Aniruddha Biswas
Trisita Ghosh
Raj Poddar
Suchetana Chakraborty

Abstract

The use of machine learning models to forecast health insurance costs based on personal characteristics is examined in this study. Age, sex, BMI, number of children, smoking status, and region were among the demographic variables included in the dataset. It was investigated how well several machine learning methods, such as Random Forest, Gradient Boosting, and Linear Regression, estimated insurance costs. After preprocessing the dataset by scaling numerical features and encoding categorical variables, k-fold cross-validation was employed to train and evaluate the regression models. The coefficient of determination (R2), mean absolute error (MAE), and root mean squared error (RMSE) were used to evaluate performance. According to experimental results, Gradient Boosting performed better than Random Forest and Linear Regression.

Downloads

Download data is not yet available.

Article Details

Section

Articles

How to Cite

[1]
Dr. Annwesha Banerjee Majumder, Dr. Sumit Das, Aniruddha Biswas, Trisita Ghosh, Raj Poddar, and Suchetana Chakraborty, “Application of Machine Learning Models for Patients Health Insurance Cost Prediction”, IJSCE, vol. 15, no. 4, pp. 11–16, Sep. 2025, doi: 10.35940/ijsce.D3685.15040925.

References

Obermeyer, Z., & Emanuel, E. J. (2016). Predicting the Future — Big Data, Machine Learning, and Clinical Medicine. The New England Journal of Medicine, 375(13), 1216-1219. DOI: https://doi.org/10.1056/nejmp1606181

Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. Journal of the American Statistical Association, 113(523), 1228-1242. DOI: https://doi.org/10.1080/01621459.2017.1319839

Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447- 453. DOI: https://doi.org/10.1126/science.aax2342

Goldstein, B. A., Navar, A. M., Pencina, M. J., & Ioannidis, J. P. (2017). Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. Journal of the American Medical Informatics Association, 24(1), 198-208.

DOI: https://doi.org/10.1093/jamia/ocw042

Choi, E., Schuetz, A., Stewart, W. F., & Sun, J. (2016). Using recurrent neural network models for early detection of heart failure onset. Journal of the American Medical Informatics Association, 24(2), 361-370. DOI: https://doi.org/10.1093/jamia/ocw112

Rajkomar, A., Oren, E., Chen, K., et al. (2018). Scalable and accurate deep learning for electronic health records. npj Digital Medicine, 1, 18.

DOI: https://doi.org/10.1038/s41746-018-0029-1

Miotto, R., Wang, F., Wang, S., Jiang, X., & Dudley, J. T. (2018). Deep Learning for Healthcare: A Review, Opportunities, and Challenges. Briefings in Bioinformatics, 19(6), 1236-1246. DOI: https://doi.org/10.1093/bib/bbx044

Ng, K., Sun, J., Hu, J., Wang, F., & Shen, Y. (2017). Personalized predictive modeling and risk factor identification using patient similarity. AMIA Annual Symposium Proceedings, 2015, 1176-1185. https://pubmed.ncbi.nlm.nih.gov/26306255/

Paul Thomas, Yabin. (2024). Application Of Data Mining In Health Care. International Research Journal of Modernisation in Engineering, Technology, and Science. 06. 2582-5208. DOI: https://www.doi.org/10.56726/IRJMETS7375510

Futoma, J., Simons, M., Panch, T., Doshi-Velez, F., & Celi, L. A. (2017). Predicting disease progression with a model combining sequence and non-sequence data. International Conference on Machine Learning (ICML). https://proceedings.mlr.press/v56/Futoma16.html

Liu, Y., Chen, P. H. C., Krause, J., & Peng, L. (2019). How to Read Articles That Use Machine Learning: Users’ Guides to the Medical Literature. JAMA, 322(18), 1806- 1816. DOI: https://doi.org/10.1001/jama.2019.16489

Davenport, T., & Kalakota, R. (2019). The Potential for Artificial Intelligence in Healthcare Future Healthcare Journal, 6(2), 94-98.

DOI: https://doi.org/10.7861/futurehosp.6-2-94

Shah, N. D., Steyerberg, E. W., & Kent, D. M. (2018). Big Data and Predictive Analytics: Recalibrating Expectations. Journal of the American Medical Association, 320(1), 27-28. DOI: https://doi.org/10.1001/jama.2018.5602

Beam, A. L., & Kohane, I. S. (2018). Big Data and Machine Learning in Health Care. JAMA, 319(13), 1317-1318.

DOI: https://doi.org/10.1001/jama.2017.18391

Chen, J. H., & Asch, S. M. (2017). Machine Learning and Prediction in Medicine — Beyond the Peak of Inflated Expectations. The New England Journal of Medicine, 376(26), 2507-2509. DOI: https://doi.org/10.1056/nejmp1702071

Rutter, J. L., & Boudreault, D. J. (2019). Artificial Intelligence in Health Care: Benefits and Challenges of Machine Learning Approaches. Applied Clinical Informatics, 10(5), 844-846. DOI: https://doi.org/10.3346/jkms.2020.35.e379

Most read articles by the same author(s)

1 2 3 4 5 6 7 > >>