Design and Empirical Evaluation of a Four-Layer AI Agent Architecture for Automated Web Application Security Testing
Corresponding author: Saken Mambetov
Abstract
This study proposes a four-layer AI agent architecture for automating routine web security operations, integrating Large Language Model (LLM) reasoning with a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) detection engine and implementing a Reasoning-Acting (ReAct) loop for autonomous testing with human-in-the-loop validation. The proposed architecture was empirically evaluated across 50 web applications sourced from OWASP WebGoat, DVWA, and custom-developed test environments over a six-month period. The experimental results demonstrate that the AI agent achieved an overall detection accuracy of 89.2% (95% CI: 86.4-92.0%), significantly outperforming traditional automated methods (67.4% accuracy, p < 0.001). Mean Time to Remediation (MTTR) decreased from 74.3 days to 28.5 days (61.6% reduction), while false positive rates decreased from 24.3% to 4.8%. According to these findings, AI agent-driven automation can substantially enhance the efficiency and reliability of web security testing. However, human expertise remains important for assessing complex vulnerabilities and detecting zero-day threats.
Keywords:
AI agent, web application security, machine learning, penetration testing, OWASP, large language model, autonomous security testing, deep learningDownloads
References
"2025 Vulnerability Statistics Report," Edgescan Stats and Reports, 2025. https://www.edgescan.com/stats-report/.
"130 Cyber Security Statistics: 2024 Trends and Data," Human Risk Management, Aug. 2024.
IBM Security, Cost of a Data Breach Report 2024. Armonk, NY, USA: IBM Corporation, 2024.
Z. Xi et al., "The Rise and Potential of Large Language Model Based Agents: A Survey," Science China Information Sciences, vol. 68, no. 2, Feb. 2025, Art. no. 121101.
"OWASP Top Ten 2021: Open Web Application Security Project; 2021," OWASP Foundation, 2025. https://owasp.org/Top10/2025/.
B. Dawadi, B. Adhikari, and D. Srivastava, "Deep Learning Technique-Enabled Web Application Firewall for the Detection of Web Attacks," Sensors, vol. 23, no. 4, Feb. 2023, Art. no. 2073.
J. R. Tadhani, V. Vekariya, V. Sorathiya, S. Alshathri, and W. El-Shafai, "Securing Web Applications Against XSS and SQLi Attacks Using a Novel Deep Learning Approach," Scientific Reports, vol. 14, no. 1, Jan. 2024, Art. no. 1803.
B. B. Ammar and A. M. Alharbi, "SQL Injection Detection Using Fine-Tuned CodeBERT," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27852–27857, Oct. 2025.
K. Li, H. Yang, and W. Visser, "DaNuoYi: Evolutionary Multitask Injection Testing on Web Application Firewalls," IEEE Transactions on Software Engineering, vol. 51, no. 9, pp. 2412–2431, Sept. 2025.
S. Hussain et al., "Vulnerability Detection in Java Source Code Using a Quantum Convolutional Neural Network with Self-Attentive Pooling, Deep Sequence, and Graph-based Hybrid Feature Extraction," Scientific Reports, vol. 14, no. 1, Mar. 2024, Art. no. 7406.
M. E. Durmuşkaya and S. Bayraklı, "Web Application Firewall Based on Machine Learning Models," PeerJ Computer Science, vol. 11, July 2025, Art. no. e2975.
Y. Guo, S. Bettaieb, and F. Casino, "A Comprehensive Analysis on Software Vulnerability Detection Datasets: Trends, Challenges, and Road Ahead," International Journal of Information Security, vol. 23, no. 5, pp. 3311–3327, Oct. 2024.
C. Merlano, "Enhancing Cyber Security through Artificial Intelligence and Machine Learning: A Literature Review," Journal of Cyber Security, vol. 6, no. 1, pp. 89–116, 2024.
Y. I. Alzoubi, A. Mishra, and A. E. Topcu, "Research Trends in Deep Learning and Machine Learning for Cloud Computing Security," Artificial Intelligence Review, vol. 57, no. 5, May 2024, Art. no. 132.
N. Montes, G. Betarte, R. Martínez, and A. Pardo, "Web Application Attacks Detection Using Deep Learning," in Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, vol. 12702, J. M. R. S. Tavares, J. P. Papa, and M. González Hidalgo, Eds. Cham, Switzerland: Springer International Publishing, 2021, pp. 227–236.
"CWE-918: Server-Side Request Forgery (SSRF)," Common Weakness Enumeration, 2024. https://cwe.mitre.org/data/definitions/918.html.
L. Wang et al., "A Survey on Large Language Model Based Autonomous Agents," Frontiers of Computer Science, vol. 18, no. 6, Dec. 2024, Art. no. 186345.
V. Ciancaglini, M. Balduzzi, S. Gariuolo, R. Vosseler, and F. Tucci, "The Road to Agentic AI: Navigating Architecture, Threats, and Solutions," Trend Micro, July 2025. https://www.trendmicro.com/vinfo/us/security/news/security-technology/the-road-to-agentic-ai-navigating-architecture-threats-and-solutions.
C. Wong, "State of Pentesting 2021: The Impact of AI and LLMs on Penetration Testing," Cobalt, May 2014. https://www.cobalt.io/blog/state-of-pentesting-2024-impact-of-llms-on-penetration-testing.
G. Deng et al., "PentestGPT: An LLM-empowered Automatic Penetration Testing Tool." arXiv, 2023.
"NodeZero: Autonomous Penetration Testing Platform," NodeZero, 2024. https://horizon3.ai/nodezero/.
"RidgeBot Intelligent Penetration Testing Robot," Ridge Security, 2024. https://ridgesecurity.ai/ridgebot/ridgebot/.
"AI-Powered Penetration Testing as a Service," Astra Security, 2024. https://www.getastra.com/pentesting/web-app.
"Burp Suite Professional with AI Features," PortSwigger, 2024. https://portswigger.net/burp/documentation/desktop/burp-ai.
K. Abdulghaffar, N. Elmrabit, and M. Yousefi, "Enhancing Web Application Security through Automated Penetration Testing with Multiple Vulnerability Scanners," Computers, vol. 12, no. 11, Nov. 2023, Art. no. 235.
S. Yao, J. Zhao, D. Yu, N. Du, and I. Shafran, "React: Synergizing Reasoning and Acting in Language Models," in International Conference on Learning Representations, Kigali, Rwanda, May 2023.
N. Shiri Harzevili, A. Boaye Belle, J. Wang, S. Wang, Z. M. (Jack) Jiang, and N. Nagappan, "A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine Learning," ACM Computing Surveys, vol. 57, no. 3, pp. 1–36, Mar. 2025.
S. He, "Choose Your Agentic AI Architecture Components," Google Cloud, Nov. 2025. https://docs.cloud.google.com/architecture/choose-agentic-ai-architecture-components.
R. Modi, "AI Agent Orchestration Patterns," Azure Architecture Center. https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns.
J. M. Nimrod, "AI and Cybersecurity in Penetration Testing," EC-Council Cybersecurity Exchange, 2025. https://www.eccouncil.org/cybersecurity-exchange/penetration-testing/ai-and-cybersecurity-in-penetration-testing/.
C. T. Giménez, A. P. Villegas, and G. A. Marañón, "HTTP Dataset CSIC 2010." CSIC, 2010, [Online]. Available: http://www.isi.csic.es/dataset/.
Downloads
How to Cite
License
Copyright (c) 2026 Bakytzhan Kulambayev, Gulnar Astaubayeva, Zhanna Mukanova, Kuralay Makhmetova, Saken Mambetov, Serik Joldasbayev

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
