Generate High-Coverage Unit Test Data Using the LLM Tool
Main Article Content
Abstract
Unit testing is a critical phase in the software development lifecycle, essential for ensuring the quality and reliability of code. However, the manual creation of unit test scripts and the preparation of corresponding test data can be a time-consuming and labor-intensive process. To address these challenges, several automated approaches have been explored, including search-based, constraint-based, random-based, and symbolic execution-based techniques for generating unit tests. In recent years, the rapid advancement of large language models (LLMs) has opened new avenues for automating various tasks, including the automatic generation of unit test scripts and test data. Despite their potential, using LLMs in a straightforward manner to generate unit tests may lead to low test coverage. This means that a significant portion of the source code, including specific statements or branches, may remain untested, which can reduce the effectiveness of the tests. To overcome this limitation, the paper presents a novel approach that not only automates the generation of unit test scripts and test data but also improves test coverage. The proposed solution begins by using an LLM tool (such as ChatGPT) to generate initial unit test scripts and data from the source code. To enhance test coverage, the specification document of the source code is also input into the LLM to generate additional test data. Following this, a coverage checking tool is used to evaluate the test coverage and identify untested statements or branches. The LLM is then applied again to generate new test data aimed specifically at addressing these gaps. The initial experimental results indicate that this method significantly improves test coverage, demonstrating its potential to enhance automated unit testing processes.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
Y. Chen, Z. Hu, C. Zhi, J. Han, S. Deng, and . Yin. “ChatUniTest: A Framework for LLM-Based Test Generation”. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (2024), pp 572–576. doi: https://doi.org/10.1145/3663529.3663801
X. Chun, M. Paltenghi, J. L. Tian, M.l Pradel and L. Zhang. “Fuzz4ALL: Universal Fuzzing with Large Language Models”. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE '24). Association for Computing Machinery, New York, NY, USA, Article 126, 1–13. doi: https://doi.org/10.1145/3597503.3639121
A. M. Dakhel, A. Nikanjam, V. Majdinasab, F. Khomh, and M. C. Desmarais. “Effective test generation using pre-trained Large Language Models and mutation testing”. Inf. Softw. Technol. (2024), 171. doi: https://doi.org/10.1016/j.infsof.2024.107468
A. Deljouyi. “Understandable Test Generation Through Capture/Replay and LLMs”. In Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion '24). Association for Computing Machinery, New York, NY, USA, 2024, pp261–263. doi: https://doi.org/10.1145/3639478.3639789
W. Junjie, Y. Huang, C. Chen, Z. Liu, S. Wang and Q. Wang. “Software Testing With Large Language Models: Survey, Landscape, and Vision”, IEEE Transactions on Software Engineering 50, (2023), pp 911-936. doi: https://doi.org/10.1109/TSE.2024.3368208
C. Lemieux, J. P. Inala, S. K. Lahiri and S. Sen, "CodaMosa: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models". 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, (2023), pp. 919-931. doi: https://doi.org/10.1109/ICSE48619.2023.00085
E. Daka and G. Fraser. “A Survey on Unit Testing Practices and Problems”. In Proceedings of the 2014 IEEE 25th International Symposium on Software Reliability Engineering (ISSRE '14). IEEE Computer Society, USA, (2014). pp. 201–211. doi: https://doi.org/10.1109/ISSRE.2014.11
J, Liu, C. S. Xia, Y. Wang, and L. Zhang. “Is your code generated by ChatGPT really correct? rigorous evaluation of large language models for code generation”. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23). Curran Associates Inc., Red Hook, NY, USA, Article 943, (2023), pp 21558–21572. doi:https://doi.org/10.1016/j.future.2024.05.034
C. Munley, A. Jarmusch, S. Chandrasekaran, “LLM4VV: Developing LLM-driven testsuite for compiler validation”, Future Generation Computer Systems, Volume 160, (2024), pp 1-13, ISSN 0167-739X. doi: https://doi.org/10.1109/TSE.2023.3334955
M. Schäfer, S. Nadi, A. Eghbali and F. Tip, "An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation", in IEEE Transactions on Software Engineering, vol. 50, no. 1, (2024), pp. 85-105. doi: https://doi.org/10.1109/TSE.2023.3334955
Y. Shengcheng, C. Fang, Y. Ling, C. Wu and Z. Chen. “LLM for Test Script Generation and Migration: Challenges, Capabilities, and Opportunities”. 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS) (2023), pp 206-217. doi: https://doi.org/10.1109/QRS60937.2023.00029
Y. Tang, Z. Liu, Z. Zhou and X. Luo, "ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite Generation". in IEEE Transactions on Software Engineering, vol. 50, no. 06, (2024), pp. 1340-1359. doi: https://doi.org/10.1109/TSE.2024.3382365
Pesati, N. (2024). Security Considerations for Large Language Model Use: Implementation Research in Securing LLM-Integrated Applications. In International Journal of Recent Technology and Engineering (IJRTE) (Vol. 13, Issue 3, pp. 19–27). https://doi.org/10.35940/ijrte.c8142.13030924
Lalaei, R. A., & Mahmoudabadi, Dr. A. (2024). Promoting Project Outcomes: A Development Approach to Generative AI and LLM-Based Software Applications’ Deployment. In International Journal of Soft Computing and Engineering (Vol. 14, Issue 3, pp. 6–13). https://doi.org/10.35940/ijsce.d3636.14030724