A Comparative Analysis of Techniques, Datasets, Feature Selection Methods, and Evaluation Metrics in Software Fault Prediction
Main Article Content
Abstract
This study presents a systematic literature review (SLR) that investigates recent advancements in Software Fault Prediction (SFP) methodologies. The review focuses on key dimensions including techniques, datasets, feature selection methods, software metrics, and evaluation criteria. By analyzing significant studies from renowned digital libraries such as ACM, IEEE, Springer Link, and Science Direct, five research questions were defined to guide the assessment of current trends in SFP research. Findings reveal that machine learning approaches— particularly neural networks, deep learning, and ensemble methods—are increasingly employed due to their capability to manage the complexity of software fault data. Public datasets, notably those from the PROMISE and NASA MDP repositories, are widely utilized, underlining the importance of dataset diversity for enhancing model performance. Feature selection methods, particularly wrapper techniques, are often employed to improve predictive accuracy. Evaluation of models predominantly relies on confusion matrix-based metrics such as Accuracy, Precision, Recall, and F1-Score. Despite these advances, challenges remain in addressing class imbalance, adapting to rapidly evolving software environments, and achieving real-time fault prediction. The study highlights the need for greater classifier diversity and ongoing methodological improvements to enhance the robustness and generalizability of SFP models.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
Matloob, F., Ghazal, T. M., Taleb, N., Aftab, S., Ahmad, M., Khan, M. A., Soomro, T. R. (2021). Software defect prediction using ensemble learning: A systematic literature review. IEEE Access.https://www.researchgate.net/publication/353107026Software_Defect_Prediction_Using_Ensemble_Learning_A_Systematic_Literature_Review
Borandag, E. (2023). Software Fault Prediction Using an RNN-Based Deep Learning Approach and Ensemble Machine Learning Techniques. Applied Sciences, 13(3), 1639. https://www.mdpi.com/2076-3417/13/3/1639
Pandey, S. K., Mishra, R. B., & Tripathi, A. K. (2020). BPDET: An effective software bug prediction model using deep representation and ensemble learning techniques. Expert Systems with Applications, 144, 113085. https://www.Science direct.com/science/article/abs/pii/S0957417419308024?utm_ source.
Rathore, S. S., & Kumar, S. (2021). An empirical study of ensemble techniques for software fault prediction. Applied Intelligence, 51, 3615-3644. https://link.springer.com/article/10.1007/s10489-020-01935-6?utm_source.
Phung, K., Ogunshile, E., & Aydin, M. (2021, October). A novel software fault prediction approach to predict error-type proneness in Java programs using stream X-machine and machine learning. In 2021, the 9th International Conference on Software Engineering Research and Innovation (CONISOFT) (pp. 168-179). IEEE.https://uwe-repository.worktribe.com/output/ 7605934/a-novel-software-fault-prediction-approach-to-predict-error-type-proneness-in-the-java-programs-using-stream-x-machine-and-machine-learning?utm_ source.
Alfredo Daza, (2025) Software defect prediction based on a multi-classifier with hyperparameters: Future work. www.sciencedirect.com/journal/results-in-engineering. https://doi.org/10.1016/j.rineng.2025.104123
Barbara Wi˛eckowska, Katarzyna B. Kubiak, Paulina Jozwiak, Wacław Moryson and Barbara Stawinska-Witoszynska (2022). Cohen’s Kappa Coefficient as a Measure to Assess Classification Improvement following the Addition of a New Marker to a Regression Model. International Journal of Environmental Research and Public Health. https://www.mdpi.com/1660-4601/19/16/10213?utm_source.
Goyal, S., & Bhatia, P. K. (2021). Software fault prediction using lion optimization algorithm. International Journal of Information Technology, 13, 2185-2190. https://ouci.dntb.gov.ua/en/works/7ABmB1a4/?utm_source.
Rathore, S. S., & Kumar, S. (2021). Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study. Applied Intelligence, 1-16. https://link.springer.com/content/pdf/10.1007/s10489-021-02346-x.pdf?utm_source.
Qiao, L., Li, X., Umer, Q., & Guo, P. (2020). Deep learning based software defect prediction. Neurocomputing, 385, 100-110. https://colab.ws/articles/10.1016%2Fj.neucom.2019.11.067?utm_source.
Aryan Boloori, Azadeh Zamanifar, Amirfarhad Farhadi (2024). Enhancing software defect prediction models using metaheuristics with a learning to rank approach. https://doi.org/10.1007/s44248-024-00016-0
Amir Elmishali and Meir Kalech (2022). Issue-Driven Features for Software Fault Prediction, Software and Information Systems Engineering. https://dblp.org/rec/journals/infsof/ ElmishaliK 23?utm_source.
Kaur, G., Pruthi, J., & Gandhi, P. (2023). Machine Learning-Based Software Fault Prediction Models. Karbala International Journal of Modern Science, 9(2). https://kijoms.uokerbala.edu.iq/home/vol9/iss2/9/?utm_source.
Rajput, P. K., Aarti, & Pal, R. (2023, February). Genetic Algorithm-Based Clustering with Neural Network Classification for Software Fault Prediction. In Proceedings of International Conference on Data Science and Applications: ICDSA 2022, Volume 1 (pp. 399-414). https://ebin.pub/proceedings-of-international-conference-on-data-science-and-applications-icdsa-2022-volume-1-9811966303-9789811966309.html?utm_source.
ARORA, T., SAINI, H., & GARG, S. (2023). Nature-Inspired Approaches in Software Fault Prediction. JUN 2023 | IRE Journals | Volume 6 Issue 12 | ISSN: 2456-8880. https://cse.mait.ac.in/index.php/campus-life/r-dlab/research-publications/9-computer-center/1254-details-of-paper-published-in-journal-international-national-during-2023-24?utm_source.
Mehmood, I., Shahid, S., Hussain, H., Khan, I., Ahmad, S., Rahman, S., & Huda, S. (2023). A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning. IEEE Access. https://dblp.org/pid/351/0094?utm_source.
Wang, Z., Tong, W., Li, P., Ye, G., Chen, H., Gong, X., & Tang, Z. (2023). BugPre: an intelligent software version-to-version bug prediction system using graph convolutional neural networks. Complex & Intelligent Systems, 9(4), 3835-3855. https://ouci.dntb.gov.ua/en/works/98oBGYjl/?utm_source.
Khan, B., & Nadeem, A. (2023). Evaluating the effectiveness of decomposed Halstead Metrics in software fault prediction. PeerJ Computer Science, 9, e1647. https://ouci.dntb.gov.ua/en/ works/ ldkAogk4/?utm_source.
Al Qasem, O., Akour, M., & Alenezi, M. (2020). The influence of deep learning algorithms on software fault prediction. IEEE Access, 8, 63945-63960. https://malenezi.github. io/malenezi/pdfs/09055422.pdf?utm_source.
Mohsen Hesamolhokama, Amirahmad Shafiee, Mohammadreza Ahmaditeshnizi, Mohammadamin Fazli, Jafar Habibi( 2024), SDPERL: A Framework for Software Defect Prediction Using Ensemble Feature Extraction and Reinforcement Learning, arXiv:2412.07927v2. https://arxiv.org/abs/ 2412.07927 ?utm_source.
Khleel, N. A. A., & Nehéz, K. (2023). Software defect prediction using a bidirectional LSTM network combined with oversampling techniques. Cluster Computing, 1-24. https://link.springer.com/article/10.1007/s10586-023-04170-z?utm_source.
Khalid, A., Badshah, G., Ayub, N., Shiraz, M., & Ghouse, M. (2023). Software Defect Prediction Analysis Using Machine Learning Techniques. Sustainability, 15(6), 5517. https://doi.org/10.3390/su15065517
Sofian Kassaymeh, Salwani Abdullah, Ph.D, Mohammed Azmi Al-Betar (2021). Salp swarm optimiser for modelling the software fault prediction problem, Journal of King Saud University – Computer and Information Sciences 34 (2022) 3365–3378. https://www.sciencedirect.com/science/article/pii/S1319157821000173?utm_source.
Das, H., Prajapati, S., Gourisaria, M. K., Pattanayak, R. M., Alameen, A., & Kolhar, M. (2023). Feature Selection Using Golden Jackal Optimization for Software Fault Prediction. Mathematics, 11(11),2438.https://www.mdpi.com/2227-7390/11/11/2438?utm_source,
Feng, S., Keung, J., Yu, X., Xiao, Y., Bennin, K. E., Kabir, M. A., & Zhang, M. (2021). COSTE: Complexity-based OverSampling Technique to alleviate the class imbalance prob- lem in software defect prediction. Information and Software Technology, 129, 106432. https://bibbase.org/network/publication/feng-keung-yu-xiao-bennin-kabir-zhang-coste-complexity-based-over-sampling-technique-to-alleviate-the-class-imbalance-problem-in-software-defect-prediction-2021?utm_source.
Hassouneh, Y., Turabieh, H., Thaher, T., Tumar, I., Chantar, H., & Too, J. (2021). Boosted whale optimization algorithm with natural selection operators for software fault prediction. IEEE Access, 9, 14239-14258. https://jeeemi.org/index.php/jeeemi/ article/view/334?utm_source=chatgpt.com
Anil Kumar Pandey and Manjari Gupta (2024), Software Metrics Selection for Fault Prediction: A Review, International Journal of Management, Technology and Engineering, ISSN NO: 2249-7455. https://www.researchgate.net/publication/382888111 _Software_Metrics_Selection_for_Fault_Prediction_A_Review?utm_source.
Zhao, K., Xu, Z., Yan, M., Zhang, T., Xue, L., Fan, M., & Keung, J. (2023). The Impact of Class Imbalance Techniques on Crash Fault Residence Prediction Models. Empirical Software Engineering, 28(2), 49. https://yanmeng.github.io/papers/EMSE231.pdf?utm_source.
Ali, U., Aftab, S., Iqbal, A., Nawaz, Z., Bashir, M. S., & Saeed, M. A. (2020). Software defect prediction using variant-based ensemble learning and feature selection techniques. Int. J. Mod. Educ. Comput. Sci, 12(5), 29-40. https://www.mecs-press.org/ijmecs/ijmecs-v12-n5/v12n5-3.html?utm_source.
Balogun, A. O., Basri, S., Abdulkadir, S. J., & Hashim, A. S. (2019). Performance analysis of feature selection methods in software defect prediction: a search method approach. Applied Sciences, 9(13), 2764. https://www.mdpi.com/2076-3417/9/13/2764?utm_source.
Anbu, M., & Anandha Mala, G. S. (2019). Feature Selection Using the Firefly Algorithm in Software Defect Prediction. Cluster Computing, 22, 10925-10934. https://jisem-journal.com/index.php/journal/article/download/6277/2891/10449?utm_source.
Iqbal, A., & Aftab, S. (2020). A Classification Framework for Software Defect Prediction Using Multi-Filter Feature Selection Technique and MLP. International Journal of Modern Education Computer Science, 12(1). https://www.mecs-press.org/ijmecs/ijmecs-v12-n1/v12n1-3.html?utm_source.
Balogun, A. O., Basri, S., Mahamad, S., Abdulkadir, S. J., Almomani, M. A., Adeyemo, V. E., .& Bajeh, A. O. (2020). Impact of feature selection methods on the predictive performance of software defect prediction models: An extensive empirical study. Symmetry, 12(7), 1147. https://www.mdpi.com/2073-8994/12/7/1147?utm_source.
Rathi, S. C., Misra, S., Colomo-Palacios, R., Adarsh, R., Neti, L. B. M., & Kumar, L. (2023). Empirical evaluation of the performance of data sampling and feature selection techniques for software fault prediction. Expert Systems with Applications, 223, 119806. https://www.researchgate.net/publication/369306462_Empirical_evaluation_of_the_performance_of_data_sampling_and_feature_selection_techniques_for_software_fault_prediction?utm_source.
Mafarja, M., Thaher, T., Al-Betar, M. A., Too, J., Awadallah, M. A., Abu Doush, I., & Turabieh, H. (2023). Classification framework for faulty software using an enhanced exploratory whale optimiser-based feature selection scheme and random forest ensemble learning. Applied Intelligence, 1-43. https://link.springer.com/article/10.1007/s10489-022-04427-x?utm_source.
Yogita Khatri Sandeep Kumar Singh (2022), An effective feature selection-based cross‑project defect prediction model for software quality improvement, Int J Syst Assur Eng Manag (March 2023) 14(Suppl. 1): S154–S172. https://ideas.repec.org/a/spr/ijsaem/ v14y2023i1d10.1007_s13198-022-01831-x.html?utm_source.
Shiqi Tang, Song Huang, Changyou Zheng, Erhu Liu, Cheng Zong, and Yixian Ding (2022), A Novel Cross-Project Software Defect Prediction Algorithm Based on Transfer Learning, TINGHUA SCIENCE AND TECHNOLOGY, ISSN 1007- 0214, 04/18 pp. 41–57 DOI: 10.26599/TST.2020.9010040. https://www.sciopen.com/article/10.26599/TST.2020.9010040?utm_source.
Goyal, S. (2023). 3PcGE: 3-parent child-based genetic evolution for software defect prediction. Innovations in Systems and Software Engineering, 19(2), 197-216. https://link.springer.com/article/10.1007/s11334-021-00427-1?utm_source.
Aarti, A., Rajput, P. K., & Khare, A. (2023, April). Hybrid semi-supervised SOM-based clustered approach with genetic algorithm for software fault classification. In AIP Conference Proceedings (Vol. 2724, No. 1). AIP Publishing. https://www.researchgate.net/publication/370379196_Hybrid_semisupervised_SOM_based_clustered_approach_with_genetic_algorithm_for_software_fault_classification?utm_source.
Khatri, Y., & Singh, S. K. (2023). An effective software cross-project fault prediction model for quality improvement. Science of Computer Programming, 226, 102918. https://ideas.repec.org/a/spr/ijsaem/v14y2023i1d10.1007_s13198-022-01831-x.html?utm_source.
Faiz, R. B., Shaheen, S., Sharaf, M., & Rauf, H. T. (2023). Optimal Feature Selection through Search-Based Optimizer in Cross-Project. Electronics, 12(3), 514. https://doi.org/10.3390/electronics12030514.
Baraah Alsangari & Göksel Biricik (2023) Performance Evaluation of various ML techniques for Software Fault Prediction using NASA dataset. 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications. https://www.proceedings.com/content/069/069589webtoc.pdf?utm_source.
Hanyu Shi & Mingxia Chen (2022) A two‐stage transformer fault diagnosis method based on multi‐filter interactive feature selection, integrated adaptive sparrow algorithm, optimised support vector machine, IET Electric Power Applications. DOI: 10.1049/elp2.12270.https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/elp2.12270?utm_source.
Sagheer Abbas, Shabib Aftab, Muhammad Adnan Khan, Taher M. Ghazal, Hussam Al Hamadi and Chan Yeob Yeun (2023), Data and Ensemble Machine Learning Fusion-Based Intelligent Software Defect Prediction System, DOI: 10.32604/cmc. 2023.037933. https://www.techscience.com/cmc/v75n3/52611?utm_source.
abdullah sharaf , Amin y. noaman, and Asaad ahmed (2023), Prediction and Correction of Software Defects in Message-Passing Interfaces Using a Static Analysis Tool and Machine Learning, IEEE Access. https://sciprofiles.com/profile/3095509?utm_source.
Al Qasem, O., Akour, M., & Alenezi, M. (2020). The influence of deep learning algorithms is a factor in software prediction. IEEE Access, 8, 63945-6396. https://malenezi.github.io/malenezi/pdfs/ 09055422.pdf?utm_source.
Kulamala, V. K., Kumar, L., & Mohapatra, D. P. (2021). Software fault prediction using LSSVM with different kernel functions. Arabian Journal for Science and Engineering, 46, 8655-8664. https://link.springer.com/article/10.1007/s13369-021-05643-2?utm_source.
Jinfu CHEN, Xiaoli WANG, Saihua CAI, Jiaping XU, Jingyi CHEN, Haibo CHEN (2022), A software defect prediction method with metric compensation based on feature selection and transfer learning, Chen et al. / Front Inform Technol Electron Eng. https://link.springer.com/article/10.1631/FITEE.2100468?utm_source.
Anupama Kaushik & Niyati Singal (2022) A hybrid model of wavelet neural network and metaheuristic algorithm for software development effort estimation, Int. j. inf. tecnol.. 14(3):1689–1698,.https://link.springer.com/journal/41870/volumes-and-issues/14-3?page=2&utm_source.
HAQUE, ALI, MCCLEAN & NOPPEN (2024), Heterogeneous Cross-Project Defect Prediction Using Encoder Networks and Transfer Learning, IEEE Access, 10.1109/ACCESS.2023.3343329. https://pure.ulster.ac.uk/files/130014670/Heterogeneous_Cross-Project_Defect_Prediction_using_Encoder_and_Transfer_Learning.pdf?utm_source.
Malhotra, R., & Khan, K. (2020). A study on software defect prediction using feature extraction techniques. In 2020, the 8th International Conference on Reliability, Infocom Technologies and Optimization
(pp. 1139-1144). IEEE. https://www.researchgate.net/ publication/ 344983707_A _Study_on_Software_Defect_Prediction_using_Feature_Extraction_Techniques?utm_source.
Waleed Albattah and Musaad Alzahrani (2024), Software Defect Prediction Based on Machine Learning and Deep Learning Techniques: An Empirical Approach, https:// doi.org/10.3390/ai5040086.https://www.mdpi.com/2673-2688/5/4/ 86? utm_source.
İlhan, Ö., & Erçelebi Ayyıldız, T. (2021). Software Quality Prediction: An Investigation Based on Artificial Intelligence Techniques for Object-Oriented Applications. In Trends in Data Engineering Methods for Intelligent Systems: Proceedings of the International Conference on Artificial Intelligence and Applied Mathematics in Engineering. https://www.researchgate.net/publication/353005268_Sofware_Quality_Prediction_An_Investigation_Based_on_Artificial_Intelligence_Techniques_for_Object-Oriented_Applications?utm_source.
Wenjun Yao, Muhammad Shafiq, Xiaoxin Lin and Xiang YuA (2023), Software Defect Prediction Method Based on Program Semantic Feature Mining, https://www.mdpi.com/2079-9292/12/7/1546?utm_source.
S. Kaliraj, Velisetti Geetha Pavan Sahasranth, V. Sivakumar (2024), A holistic approach to software fault prediction with dynamic classification, Automated Software Engineering.
Ran YAN, Meichen WANG, Zhaowei XU and Kai ZHANG (2023) Research on Software Fault Feature Data Extraction Method for Software Fault Prediction Technology, Advances in Machinery, Materials Science and Engineering Application IX M. Chen et al. (Eds.). https://www.researchgate.net/publication/ 374791399_Research_on_Software_Fault_ Feature_ Data_ Extraction_ Method_ for_ Software_ Fault_ Prediction_ Technology? utm_source.
Hrishikesh Kumar & Himansu Das (2025), Cost-Effective Prediction Model for Optimal Selection of Software Faults Using Coati Optimization Algorithm, SN Computer Science. https://link.springer.com/article/10.1007/s42979-025-03953-y?utm_source.