Toward Robust Malware Detection: A Survey of Datasets, Techniques, and Practical Challenges
Received: 12 January 2026 | Revised: 14 February 2026, 6 March 2026, 19 March 2026, and 21 March 2026 | Accepted: 23 March 2026 | Online: 8 April 2026
Corresponding author: Van-Quynh Trinh
Abstract
The increasing sophistication of malware has diminished the effectiveness of traditional signature-based detection. While Machine Learning (ML), Deep Learning (DL), and Large Language Models (LLMs) have improved malware classification, real-world systems continue to struggle with evasion attacks, temporal drift, and class imbalance. This study reviews the advancements in robust malware detection, focusing on benchmark datasets, detection methods, and operational constraints. Public datasets - EMBER2018, SOREL-20M, MalDICT, MOTIF, and EMBER2024 - are assessed for scale, label quality, and reproducibility. This paper contributes: (i) a Robust Malware Evaluation Protocol (RMEP) for consistent benchmarking under low False-Positive Rates (FPR) (≤ 0.1%) with temporal splits, and (ii) a Dataset-Task-Robustness (DTR) matrix for systematic comparison, offering practical guidance for reproducible malware-detection research. Future efforts should focus on broader multi-platform benchmark coverage, explicit analysis of robustness–accuracy trade-offs, interpretable language-assisted detection pipelines, and privacy-preserving collaborative learning frameworks.
Keywords:
malware detection, robustness, benchmark datasets, Machine Learning (ML), Deep Learning (DL), Large Language Models (LLMs)Downloads
References
J. Ferdous, R. Islam, A. Mahboubi, and M. Z. Islam, "A Survey on ML Techniques for Multi-Platform Malware Detection: Securing PC, Mobile Devices, IoT, and Cloud Environments," Sensors, vol. 25, no. 4, Feb. 2025, Art. no. 1153.
R. J. Joyce et al., "EMBER2024 - A Benchmark Dataset for Holistic Evaluation of Malware Classifiers," in 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, Toronto, Canada, Aug. 2025.
S. Altamimi and M. Ababneh, "Detecting Spam and Malware Using BERT and LLMs," in 2024 25th International Arab Conference on Information Technology (ACIT), Zarqa, Jordan, Dec. 2024.
P. M. Sánchez Sánchez, A. H. Celdrán, G. Bovet, and G. M. Pérez, "Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls," in MILCOM 2024 - 2024 IEEE Military Communications Conference (MILCOM), Washington, DC, USA, Oct. 2024.
H. Almajed, A. Alsaqer, and M. Frikha, "Imbalance Datasets in Malware Detection: A Review of Current Solutions and Future Directions.," International Journal of Advanced Computer Science & Applications, vol. 16, no. 1, 2025, Art. no. 1323.
M. G. Gaber, M. Ahmed, and H. Janicke, "Malware Detection with Artificial Intelligence: A Systematic Literature Review," ACM Computing Surveys, vol. 56, no. 6, pp. 1–33, Jan. 2024.
C. Rondanini, B. Carminati, E. Ferrari, A. Kundu, and A. Gaudiano, "Malware Detection at the Edge with Lightweight LLMs: A Performance Evaluation," ACM Transactions on Internet Technology, vol. 26, no. 1, Jan. 2026, Art. no. 15.
J. Al-Karaki, M. A.-Z. Khan, and M. Omar, "Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches." arXiv, Sept. 11, 2024.
H. S. Anderson and P. Roth, “EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models.” arXiv, Apr. 2018.
R. Harang and E. M. Rudd, "SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection." arXiv, Dec. 14, 2020.
R. J. Joyce, E. Raff, C. Nicholas, and J. Holt, "MalDICT: Benchmark Datasets on Malware Behaviors, Platforms, Exploitation, and Packers." arXiv, Oct. 18, 2023.
R. J. Joyce, D. Amlani, C. Nicholas, and E. Raff, "MOTIF: A Malware Reference Dataset with Ground Truth Family Labels," Computers & Security, vol. 124, Jan. 2023, Art. no. 102921.
D. Gibert, C. Mateu, and J. Planes, "The rise of machine learning for detection and classification of malware: Research developments, trends and challenges," Journal of Network and Computer Applications, vol. 153, Mar. 2020, Art. no. 102526.
D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, "Drebin: Effective and explainable detection of android malware in your pocket.," in Network and Distributed System Security Symposium (NDSS 2014), San Diego, CA, USA, Feb. 2014.
R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, and S. Venkatraman, "Robust Intelligent Malware Detection Using Deep Learning," IEEE Access, vol. 7, pp. 46717–46738, Apr. 2019.
M. Ganesamoorthi, K. Subramanian, and B. D, “A Comprehensive Review on Machine Learning and Deep Learning Based Malware Detection Methods,” in 2024 International Conference on Emerging Research in Computational Science (ICERCS), Coimbatore, India, Dec. 2024.
L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, "Malware images: visualization and automatic classification," in Proceedings of the 8th International Symposium on Visualization for Cyber Security, Pittsburgh, PA, USA, July 2011.
A. Al-Marghilani, "Comprehensive Analysis of IoT Malware Evasion Techniques," Engineering, Technology & Applied Science Research, vol. 11, no. 4, pp. 7495–7500, Aug. 2021.
G. M. and S. C. Sethuraman, "A comprehensive survey on deep learning based malware detection techniques," Computer Science Review, vol. 47, Feb. 2023, Art. no. 100529.
X. Liu, Y. Lin, H. Li, and J. Zhang, "A novel method for malware detection on ML-based visualization technique," Computers & Security, vol. 89, Feb. 2020, Art. no. 101682.
F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, and L. Cavallaro, "TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time," in 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, USA, 2019.
D. Arp et al., "Dos and Don’ts of Machine Learning in Computer Security," in 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 2022.
R. Jordaney, K. Sharad, S. K. Dash, Z. Wang, D. Papini, I. Nouretdinov, and L. Cavallaro, "Transcend: Detecting Concept Drift in Malware Classification Models," in 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, Canada, 2017.
S. M. Lundberg and S.-I. Lee, "A Unified Approach to Interpreting Model Predictions," in Advances in Neural Information Processing Systems, 2017, vol. 30, pp. 4756–4774.
M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, Aug. 2016.
J. Kumar and G. Ranganathan, "Malware Attack Detection in Large Scale Networks using the Ensemble Deep Restricted Boltzmann Machine," Engineering, Technology & Applied Science Research, vol. 13, no. 5, pp. 11773–11778, Oct. 2023.
Downloads
How to Cite
License
Copyright (c) 2026 Trong-Thua Huynh, De-Thu Huynh, Van-Quynh Trinh

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
