Self-Healing Infrastructure: Autonomous LLM Agents for Real-Time Remediation of Configuration Drift and Security Misconfigurations in IaC Deployments
Main Article Content
Abstract
The application of Infrastructure as Code (IaC) has enhanced cloud environment scalability and automation, but configuration drift and security misconfigurations remain critical operational and security issues. Current drift detection and remediation solutions rely largely on reactive, rules-based, and human intervention; therefore, they are ineffective in dynamic, multi-cloud environments. This research aims to develop and deploy a self-healing infrastructure architecture that autonomously identifies and recovers from configuration drift and security misconfigurations in real time. The paper suggests the following to accomplish this: a new multi-agent architecture based on Large Language Models (LLMs), in which Drift detectors, security reasoners, root-cause analysers, remediation generators, and post-remediation validators operate within a closed-loop pipeline. To evaluate the framework, a publicly available IaC dataset (written in Terraform) of simulated drift situations is used. According to experimental results, the proposed LLM-agent system outperforms rule-based and semi automated systems, with a drift detection rate of 96.8, a security misconfiguration detection rate of 95.2, and a mean time to remediation of 6.9 minutes. The framework is also very effective in reducing false positives and manual intervention, as well as getting high policy compliance. Such findings affirm the usefulness of autonomous LLM agents in empowering proactive, intelligent and scalable self-healing infrastructure management in contemporary cloud systems.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
Y. Fu et al., Autonomous Data Agents: A New Opportunity for Smart Data, vol. 1, no. 1. arXiv, 2025. [Online].
Available: http://arxiv.org/abs/2509.18710
B. Ning, X. Zong, and K. He, “MALF: A Multi-Agent LLM Framework for Intelligent Fuzzing of Industrial Control Protocols,” 2025, [Online]. Available: http://arxiv.org/abs/2510.02694
Y. Tang et al., “Empowering Real-World: A Survey on the Technology, Practice, and Evaluation of LLM-driven Industry Agents,” vol. 14, no. 8, pp. 1–33, 2025, [Online]. Available: http://arxiv.org/abs/2510.17491
Y. Wang et al., “InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration,” vol. 1, no. 1, pp. 1–27, 2025, [Online]. Available: http://arxiv.org/abs/2510.18327
L. Zhang et al., “MicroRemed: Benchmarking LLMs in Microservices Remediation,” 2025, [Online]. Available: https://arxiv.org/pdf/2511.01166
V. Vinay, “The Evolution of Agentic AI in Cybersecurity: From Single LLM Reasoners to Multi-Agent Systems and Autonomous Pipelines,” arXiv, 2025, [Online]. Available: https://www.arxiv.org/abs/2512.06659
S. Vallabhaneni, T. Berkane, and M. Majumder, “The AI Committee: A Multi-Agent Framework for Automated Validation and Remediation of Web-Sourced Data,” 2025, [Online]. Available: http://arxiv.org/abs/2512.21481
T. A. Syed, M. R. Belgaum, S. Jan, A. A. Khan, and S. S. Alqahtani, “Agentic AI for Autonomous Defence in Software Supply Chain Security: Beyond Provenance to Vulnerability Mitigation,” 2025, [Online]. Available:http://arxiv.org/abs/2512.23480
M. De Jesus, P. Sylvester, W. Clifford, A. Perez, and P. Lama, “LLM-Based Multi-Agent Framework For Troubleshooting Distributed Systems,” Proc. - 2025 IEEE Cloud Summit, Cloud-Summit 2025, pp. 110–115, 2025, DOI: https://doi.org/10.1109/Cloud-Summit64795.2025.00024
R. Song, M. O. Ozmen, H. Kim, A. Bianchi, and Z. B. Celik, “Enhancing LLM-based Autonomous Driving Agents to Mitigate Perception Attacks,” 2024, [Online]. Available: http://arxiv.org/abs/2409.14488
A. Gupta, "Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling Autonomous LLM Systems Abhivansh Gupta Intent Specification (ISpec)," 2024, [Online]. Available:https://arxiv.org/abs/2512.17259
Oluwatosin Oladayo Aramide, “Autonomous network monitoring using LLMs and multi-agent systems,” World J. Adv. Eng. Technol. Sci., vol. 13, no. 2, pp. 974–985, 2024, DOI: https://doi.org/10.30574/wjaets.2024.13.2.0639
T. Guo et al., "Large Language Model Based Multi-Agents: A Survey of Progress and Challenges," Proc. IJCAI, pp. 8048–8057, 2024, DOI: https://doi.org/10.24963/ijcai.2024/890
R. Kakarla, “LLM-Based Autonomous Remediation for DevSecOps Pipelines,” Eastasouth J. Inf. Syst. Comput. Sci., vol. 2, no. 02, pp. 179–188, 2024, DOI: https://doi.org/10.58812/esiscs.v2i02.856
Z. Wang et al., “RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models,” Int. Conf. Inf. Knowl. Manag. Proc., no. October 2024, pp. 4966–4974, 2024, DOI: https://doi.org/10.1145/3627673.3680016
G. Liu et al., "LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects," Trans. Mach. Learn. Res., vol. 2025-November, no. 2024, pp. 1–75, 2025, [Online]. Available: https://arxiv.org/abs/2504.19838
Y. Liu et al., “Secure Multi-LLM Agentic AI and Agentification for Edge General Intelligence by Zero-Trust: A Survey,” ACM Comput. Surv., vol. 9, no. 9, 2025, [Online]. Available: http://arxiv.org/abs/2508.19870
J. Vyas and M. Mercangöz, “Autonomous Industrial Control using an Agentic Framework with Large Language Models,” IFAC-PapersOnLine, vol. 59, no. 6, pp. 349–354, 2025, DOI: https://doi.org/10.1016/j.ifacol.2025.07.170
C. Wang, L. Tang, M. Yuan, J. Yu, X. Xie, and J. Bu, “Leveraging LLM Agents for Automated Video Game Testing,” 2025, [Online]. Available: http://arxiv.org/abs/2509.22170
E. Y. Chang and L. Geng, “SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning,” Proc. VLDB Endow., vol. 18, no. 12, pp. 4874–4886, 2025, DOI: https://doi.org/10.14778/3750601.3750611
H. Wang, C. M. Poskitt, and J. Sun, AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents, vol. 1, no. 1. arXiv, 2025. [Online]. Available: http://arxiv.org/abs/2503.18666
T. Studies, "RESEARCH ARTICLE Agentic AI Frameworks: Building Autonomous, Self-Healing Systems for Financial Infrastructure," pp. 364–383, 2025, doi: 10.32996/jcsts.(complete suffix required — please supply the full DOI, e.g., 10.32996/jcsts. 2025.7.X. YY)DOI: https://doi.org/10.32996/jcsts.2025.7.12.46
M. Xu et al., “Forewarned is Forearmed: A Survey on Large Language Model-based Agents in Autonomous Cyberattacks,” vol. 1, no. 1, 2025, [Online]. Available: http://arxiv.org/abs/2505.12786
Y. Zhang, A. M. Sabre, A. Youssef, and D. Kundur, “Grid-Agent: An LLM-Powered Multi-Agent System
for Power Grid Control,” pp. 1–10, 2025, [Online]. Available: http://arxiv.org/abs/2508.05702
Z. Fu, F. Chen, and L. Jiang, “QAgent: An LLM-based Multi-Agent System for Autonomous OpenQASM programming,” 2025, [Online]. Available: http://arxiv.org/abs/2508.20134