Design and Empirical Evaluation of a Four-Layer AI Agent Architecture for Automated Web Application Security Testing

Bakhytzhan Kulambayev; Gulnar Astaubayeva; Zhanna Mukanova; Kuralay Makhmetova; Saken Mambetov; Serik Joldasbayev

doi:10.48084/etasr.16879

Authors

Bakhytzhan Kulambayev Higher School of Telecommunication, Turan University, Almaty, Kazakhstan | School of Digital Technology, Narxoz University, Almaty, Kazakhstan https://orcid.org/0009-0002-9279-6239
Gulnar Astaubayeva School of Digital Technology, Narxoz University, Almaty, Kazakhstan https://orcid.org/0000-0002-0286-3518
Zhanna Mukanova Higher School of Computer Engineering, Turan University, Almaty, Kazakhstan https://orcid.org/0000-0001-6506-9007
Kuralay Makhmetova School of Software Engineering, Astana IT University, Astana, Kazakhstan https://orcid.org/0009-0009-0987-1041
Saken Mambetov Department of Cybersecurity and Cryptology, Al-Farabi Kazakh National University, Almaty, Kazakhstan | Higher School of Information Technology, Turan University, Almaty, Kazakhstan https://orcid.org/0000-0002-7249-5378
Serik Joldasbayev Department of Computer Engineering, International Information Technology University, Almaty, Kazakhstan https://orcid.org/0000-0002-8689-1822

Volume: 16 | Issue: 2 | Pages: 33582-33588 | April 2026 | https://doi.org/10.48084/etasr.16879

Received: 11 December 2025 | Revised: 19 January 2026 and 10 February 2026 | Accepted: 11 February 2026 | Online: 16 February 2026
Corresponding author: Saken Mambetov

Abstract

This study proposes a four-layer AI agent architecture for automating routine web security operations, integrating Large Language Model (LLM) reasoning with a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) detection engine and implementing a Reasoning-Acting (ReAct) loop for autonomous testing with human-in-the-loop validation. The proposed architecture was empirically evaluated across 50 web applications sourced from OWASP WebGoat, DVWA, and custom-developed test environments over a six-month period. The experimental results demonstrate that the AI agent achieved an overall detection accuracy of 89.2% (95% CI: 86.4-92.0%), significantly outperforming traditional automated methods (67.4% accuracy, p < 0.001). Mean Time to Remediation (MTTR) decreased from 74.3 days to 28.5 days (61.6% reduction), while false positive rates decreased from 24.3% to 4.8%. According to these findings, AI agent-driven automation can substantially enhance the efficiency and reliability of web security testing. However, human expertise remains important for assessing complex vulnerabilities and detecting zero-day threats.

Keywords:

AI agent, web application security, machine learning, penetration testing, OWASP, large language model, autonomous security testing, deep learning

References

"2025 Vulnerability Statistics Report," Edgescan Stats and Reports, 2025. https://www.edgescan.com/stats-report/.

"130 Cyber Security Statistics: 2024 Trends and Data," Human Risk Management, Aug. 2024.

IBM Security, Cost of a Data Breach Report 2024. Armonk, NY, USA: IBM Corporation, 2024.

Z. Xi et al., "The Rise and Potential of Large Language Model Based Agents: A Survey," Science China Information Sciences, vol. 68, no. 2, Feb. 2025, Art. no. 121101. DOI: https://doi.org/10.1007/s11432-024-4222-0

"OWASP Top Ten 2021: Open Web Application Security Project; 2021," OWASP Foundation, 2025. https://owasp.org/Top10/2025/.

B. Dawadi, B. Adhikari, and D. Srivastava, "Deep Learning Technique-Enabled Web Application Firewall for the Detection of Web Attacks," Sensors, vol. 23, no. 4, Feb. 2023, Art. no. 2073. DOI: https://doi.org/10.3390/s23042073

J. R. Tadhani, V. Vekariya, V. Sorathiya, S. Alshathri, and W. El-Shafai, "Securing Web Applications Against XSS and SQLi Attacks Using a Novel Deep Learning Approach," Scientific Reports, vol. 14, no. 1, Jan. 2024, Art. no. 1803. DOI: https://doi.org/10.1038/s41598-023-48845-4

B. B. Ammar and A. M. Alharbi, "SQL Injection Detection Using Fine-Tuned CodeBERT," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 27852–27857, Oct. 2025. DOI: https://doi.org/10.48084/etasr.13340

K. Li, H. Yang, and W. Visser, "DaNuoYi: Evolutionary Multitask Injection Testing on Web Application Firewalls," IEEE Transactions on Software Engineering, vol. 51, no. 9, pp. 2412–2431, Sept. 2025. DOI: https://doi.org/10.1109/TSE.2023.3343716

S. Hussain et al., "Vulnerability Detection in Java Source Code Using a Quantum Convolutional Neural Network with Self-Attentive Pooling, Deep Sequence, and Graph-based Hybrid Feature Extraction," Scientific Reports, vol. 14, no. 1, Mar. 2024, Art. no. 7406. DOI: https://doi.org/10.1038/s41598-024-56871-z

M. E. Durmuşkaya and S. Bayraklı, "Web Application Firewall Based on Machine Learning Models," PeerJ Computer Science, vol. 11, July 2025, Art. no. e2975. DOI: https://doi.org/10.7717/peerj-cs.2975

Y. Guo, S. Bettaieb, and F. Casino, "A Comprehensive Analysis on Software Vulnerability Detection Datasets: Trends, Challenges, and Road Ahead," International Journal of Information Security, vol. 23, no. 5, pp. 3311–3327, Oct. 2024. DOI: https://doi.org/10.1007/s10207-024-00888-y

C. Merlano, "Enhancing Cyber Security through Artificial Intelligence and Machine Learning: A Literature Review," Journal of Cyber Security, vol. 6, no. 1, pp. 89–116, 2024. DOI: https://doi.org/10.32604/jcs.2024.056164

Y. I. Alzoubi, A. Mishra, and A. E. Topcu, "Research Trends in Deep Learning and Machine Learning for Cloud Computing Security," Artificial Intelligence Review, vol. 57, no. 5, May 2024, Art. no. 132. DOI: https://doi.org/10.1007/s10462-024-10776-5

N. Montes, G. Betarte, R. Martínez, and A. Pardo, "Web Application Attacks Detection Using Deep Learning," in Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, vol. 12702, J. M. R. S. Tavares, J. P. Papa, and M. González Hidalgo, Eds. Cham, Switzerland: Springer International Publishing, 2021, pp. 227–236. DOI: https://doi.org/10.1007/978-3-030-93420-0_22

"CWE-918: Server-Side Request Forgery (SSRF)," Common Weakness Enumeration, 2024. https://cwe.mitre.org/data/definitions/918.html.

L. Wang et al., "A Survey on Large Language Model Based Autonomous Agents," Frontiers of Computer Science, vol. 18, no. 6, Dec. 2024, Art. no. 186345. DOI: https://doi.org/10.1007/s11704-024-40231-1

V. Ciancaglini, M. Balduzzi, S. Gariuolo, R. Vosseler, and F. Tucci, "The Road to Agentic AI: Navigating Architecture, Threats, and Solutions," Trend Micro, July 2025. https://www.trendmicro.com/vinfo/us/security/news/security-technology/the-road-to-agentic-ai-navigating-architecture-threats-and-solutions.

C. Wong, "State of Pentesting 2021: The Impact of AI and LLMs on Penetration Testing," Cobalt, May 2014. https://www.cobalt.io/blog/state-of-pentesting-2024-impact-of-llms-on-penetration-testing.

G. Deng et al., "PentestGPT: An LLM-empowered Automatic Penetration Testing Tool." arXiv, 2023.

"NodeZero: Autonomous Penetration Testing Platform," NodeZero, 2024. https://horizon3.ai/nodezero/.

"RidgeBot Intelligent Penetration Testing Robot," Ridge Security, 2024. https://ridgesecurity.ai/ridgebot/ridgebot/.

"AI-Powered Penetration Testing as a Service," Astra Security, 2024. https://www.getastra.com/pentesting/web-app.

"Burp Suite Professional with AI Features," PortSwigger, 2024. https://portswigger.net/burp/documentation/desktop/burp-ai.

K. Abdulghaffar, N. Elmrabit, and M. Yousefi, "Enhancing Web Application Security through Automated Penetration Testing with Multiple Vulnerability Scanners," Computers, vol. 12, no. 11, Nov. 2023, Art. no. 235. DOI: https://doi.org/10.3390/computers12110235

S. Yao, J. Zhao, D. Yu, N. Du, and I. Shafran, "React: Synergizing Reasoning and Acting in Language Models," in International Conference on Learning Representations, Kigali, Rwanda, May 2023.

N. Shiri Harzevili, A. Boaye Belle, J. Wang, S. Wang, Z. M. (Jack) Jiang, and N. Nagappan, "A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine Learning," ACM Computing Surveys, vol. 57, no. 3, pp. 1–36, Mar. 2025. DOI: https://doi.org/10.1145/3699711

S. He, "Choose Your Agentic AI Architecture Components," Google Cloud, Nov. 2025. https://docs.cloud.google.com/architecture/choose-agentic-ai-architecture-components.

R. Modi, "AI Agent Orchestration Patterns," Azure Architecture Center. https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns.

J. M. Nimrod, "AI and Cybersecurity in Penetration Testing," EC-Council Cybersecurity Exchange, 2025. https://www.eccouncil.org/cybersecurity-exchange/penetration-testing/ai-and-cybersecurity-in-penetration-testing/.

C. T. Giménez, A. P. Villegas, and G. A. Marañón, "HTTP Dataset CSIC 2010." CSIC, 2010, [Online]. Available: http://www.isi.csic.es/dataset/.

Design and Empirical Evaluation of a Four-Layer AI Agent Architecture for Automated Web Application Security Testing

Authors

Abstract

Keywords:

References

Downloads

How to Cite

Metrics

License

template

Download the latest version of our template (March 13, 2026)