Enhancing Vulnerability Detection Precision through Ensemble Learning with Large Language Models
Received: 27 February 2026 | Revised: 19 May 2026 | Accepted: 27 May 2026 | Online: 28 June 2026
Corresponding author: Muhammad Rehan Faheem
Abstract
This study investigates the use of ensemble learning with Large Language Models (LLMs) to improve the accuracy of software vulnerability prediction, following a structured experimental approach to assess whether combining multiple models can enhance performance. Three baseline models, CodeBERT, GraphCodeBERT, and CodeT5, were trained and assessed on the Devign dataset, which provides a large collection of labeled source code snippets. Their outputs were then integrated using three ensemble techniques: Majority Voting, Weighted Voting, and Stacking. Precision, recall, and F1-score metrics were used to gauge performance. Ensemble approaches outperformed all standalone models. In particular, Majority Voting increased precision from 0.601 (CodeBERT) to 0.690, representing a 14.81% improvement. Keeping in view the detection accuracy, this study focused on reducing the false positives. The results show that the ensemble techniques are a practical approach to boost the precision of LLMs in the detection of vulnerabilities. Ensemble learning can address the challenges faced by standalone models by reducing false positives and improving the overall trade-off between accuracy and reliability. The study suggests that ensemble methods offer great potential in the advancement of software security analysis.
Keywords:
ensemble learning, large language models, vulnerability detection, precision, majority voting, weighted voting, stackingReferences
[1] W. Li, S. Manickam, Y. Chong, and S. Karuppayah, "Talking Like a Phisher: LLM-Based Attacks on Voice Phishing Classifiers." arXiv, July 22, 2025.
[2] S. Illindala and S. Sabie, "Assessment of the Practicality of LLMs in the Field of Cybersecurity and Detection of Malicious Code," American Journal of Student Research, pp. 177–186, 2025.
[3] K. Gladkikh and A. A. Zakharov, "Approach to Forming Vulnerability Datasets for Fine-Tuning AI Agents," in 2025 International Russian Smart Industry Conference (SmartIndustryCon), Mar. 2025, pp. 771–776.
[4] J. Saxe and K. Berlin, "eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys." arXiv, Feb. 27, 2017.
[5] Z. Li et al., "VulDeePecker: A Deep Learning-Based System for Vulnerability Detection," in Proceedings 2018 Network and Distributed System Security Symposium, 2018.
[6] J. Gui et al., "Deep Anomaly Detection of Temporal Heterogeneous Data in AIOps: A Survey," Frontiers of Information Technology & Electronic Engineering, vol. 26, no. 9, pp. 1551–1576, Sept. 2025.
[7] A. Shestov et al., "Finetuning Large Language Models for Vulnerability Detection," IEEE Access, vol. 13, pp. 38889–38900, 2025.
[8] Y. Zhou, S. Liu, J. Siow, X. Du, and Y. Liu, "Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks," in Advances in Neural Information Processing Systems, 2019.
[9] V. Akuthota, R. Kasula, S. T. Sumona, M. Mohiuddin, M. T. Reza, and M. M. Rahman, "Vulnerability Detection and Monitoring Using LLM," in 2023 IEEE 9th International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE), Nov. 2023, pp. 309–314.
[10] V. Nguyen, S. Nepal, X. Yuan, T. Wu, and C. Rudolph, "SAFE: A Novel Approach For Software Vulnerability Detection from Enhancing The Capability of Large Language Models," in Proceedings of the 20th ACM Asia Conference on Computer and Communications Security, May 2025, pp. 392–406.
[11] H. Hanif and S. Maffeis, "VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection," in 2022 International Joint Conference on Neural Networks (IJCNN), July 2022, pp. 1–8.
[12] M. Fu and C. Tantithamthavorn, "LineVul: a transformer-based line-level vulnerability prediction," in Proceedings of the 19th International Conference on Mining Software Repositories, July 2022, pp. 608–620.
[13] Y. Luo, W. Xu, and D. Xu, "Detecting code vulnerabilities with heterogeneous GNN training," International Journal of Information Security, vol. 24, no. 5, Sept. 2025, Art. no. 213.
[14] M. M. Abualhaj, S. N. Al-Khatib, M. A. Zyoud, I. Qaddara, and M. Anbar, "Enhancing Intrusion Detection System Performance Using a Hybrid of Harris Hawks and Whale Optimization Algorithms," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 24354–24361, Aug. 2025.
[15] S. Elsayed, K. Mohamed, and M. A. Madkour, "A Comparative Study of Using Deep Learning Algorithms in Network Intrusion Detection," IEEE Access, vol. 12, pp. 58851–58870, 2024.
[16] "DetectVul/devign." Hugging Face, [Online]. Available: https://huggingface.co/datasets/DetectVul/devign.
Downloads
How to Cite
License
Copyright (c) 2026 Hussein Al-Ofeishat, Azhar Hussain, Muhammad Rehan Faheem, Syed Asim Ali Shah, Hannan Adeel, Muzammil Hussain

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
