Toward Robust Malware Detection: A Survey of Datasets, Techniques, and Practical Challenges

Trong-Thua Huynh; De-Thu Huynh; Van-Quynh Trinh

doi:10.48084/etasr.17500

Authors

Trong-Thua Huynh Information Security Technology Lab, Posts and Telecommunications Institute of Technology, Vietnam
De-Thu Huynh School of Computer Science & Engineering, The Saigon International University, Vietnam
Van-Quynh Trinh Information Security Technology Lab, Posts and Telecommunications Institute of Technology, Vietnam

Volume: 16 | Issue: 3 | Pages: 35064-35070 | June 2026 | https://doi.org/10.48084/etasr.17500

Received: 12 January 2026 | Revised: 14 February 2026, 6 March 2026, 19 March 2026, and 21 March 2026 | Accepted: 23 March 2026 | Online: 8 April 2026

Corresponding author: Van-Quynh Trinh

Abstract

The increasing sophistication of malware has diminished the effectiveness of traditional signature-based detection. While Machine Learning (ML), Deep Learning (DL), and Large Language Models (LLMs) have improved malware classification, real-world systems continue to struggle with evasion attacks, temporal drift, and class imbalance. This study reviews the advancements in robust malware detection, focusing on benchmark datasets, detection methods, and operational constraints. Public datasets - EMBER2018, SOREL-20M, MalDICT, MOTIF, and EMBER2024 - are assessed for scale, label quality, and reproducibility. This paper contributes: (i) a Robust Malware Evaluation Protocol (RMEP) for consistent benchmarking under low False-Positive Rates (FPR) (≤ 0.1%) with temporal splits, and (ii) a Dataset-Task-Robustness (DTR) matrix for systematic comparison, offering practical guidance for reproducible malware-detection research. Future efforts should focus on broader multi-platform benchmark coverage, explicit analysis of robustness–accuracy trade-offs, interpretable language-assisted detection pipelines, and privacy-preserving collaborative learning frameworks.

Keywords:

malware detection, robustness, benchmark datasets, Machine Learning (ML), Deep Learning (DL), Large Language Models (LLMs)

References

J. Ferdous, R. Islam, A. Mahboubi, and M. Z. Islam, "A Survey on ML Techniques for Multi-Platform Malware Detection: Securing PC, Mobile Devices, IoT, and Cloud Environments," Sensors, vol. 25, no. 4, Feb. 2025, Art. no. 1153.

R. J. Joyce et al., "EMBER2024 - A Benchmark Dataset for Holistic Evaluation of Malware Classifiers," in 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, Toronto, Canada, Aug. 2025.

S. Altamimi and M. Ababneh, "Detecting Spam and Malware Using BERT and LLMs," in 2024 25th International Arab Conference on Information Technology (ACIT), Zarqa, Jordan, Dec. 2024.

P. M. Sánchez Sánchez, A. H. Celdrán, G. Bovet, and G. M. Pérez, "Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls," in MILCOM 2024 - 2024 IEEE Military Communications Conference (MILCOM), Washington, DC, USA, Oct. 2024.

H. Almajed, A. Alsaqer, and M. Frikha, "Imbalance Datasets in Malware Detection: A Review of Current Solutions and Future Directions.," International Journal of Advanced Computer Science & Applications, vol. 16, no. 1, 2025, Art. no. 1323.

M. G. Gaber, M. Ahmed, and H. Janicke, "Malware Detection with Artificial Intelligence: A Systematic Literature Review," ACM Computing Surveys, vol. 56, no. 6, pp. 1–33, Jan. 2024.

C. Rondanini, B. Carminati, E. Ferrari, A. Kundu, and A. Gaudiano, "Malware Detection at the Edge with Lightweight LLMs: A Performance Evaluation," ACM Transactions on Internet Technology, vol. 26, no. 1, Jan. 2026, Art. no. 15.

J. Al-Karaki, M. A.-Z. Khan, and M. Omar, "Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches." arXiv, Sept. 11, 2024.

H. S. Anderson and P. Roth, “EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models.” arXiv, Apr. 2018.

R. Harang and E. M. Rudd, "SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection." arXiv, Dec. 14, 2020.

R. J. Joyce, E. Raff, C. Nicholas, and J. Holt, "MalDICT: Benchmark Datasets on Malware Behaviors, Platforms, Exploitation, and Packers." arXiv, Oct. 18, 2023.

R. J. Joyce, D. Amlani, C. Nicholas, and E. Raff, "MOTIF: A Malware Reference Dataset with Ground Truth Family Labels," Computers & Security, vol. 124, Jan. 2023, Art. no. 102921.

D. Gibert, C. Mateu, and J. Planes, "The rise of machine learning for detection and classification of malware: Research developments, trends and challenges," Journal of Network and Computer Applications, vol. 153, Mar. 2020, Art. no. 102526.

D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, "Drebin: Effective and explainable detection of android malware in your pocket.," in Network and Distributed System Security Symposium (NDSS 2014), San Diego, CA, USA, Feb. 2014.

R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, and S. Venkatraman, "Robust Intelligent Malware Detection Using Deep Learning," IEEE Access, vol. 7, pp. 46717–46738, Apr. 2019.

M. Ganesamoorthi, K. Subramanian, and B. D, “A Comprehensive Review on Machine Learning and Deep Learning Based Malware Detection Methods,” in 2024 International Conference on Emerging Research in Computational Science (ICERCS), Coimbatore, India, Dec. 2024.

L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, "Malware images: visualization and automatic classification," in Proceedings of the 8th International Symposium on Visualization for Cyber Security, Pittsburgh, PA, USA, July 2011.

A. Al-Marghilani, "Comprehensive Analysis of IoT Malware Evasion Techniques," Engineering, Technology & Applied Science Research, vol. 11, no. 4, pp. 7495–7500, Aug. 2021.

G. M. and S. C. Sethuraman, "A comprehensive survey on deep learning based malware detection techniques," Computer Science Review, vol. 47, Feb. 2023, Art. no. 100529.

X. Liu, Y. Lin, H. Li, and J. Zhang, "A novel method for malware detection on ML-based visualization technique," Computers & Security, vol. 89, Feb. 2020, Art. no. 101682.

F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, and L. Cavallaro, "TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time," in 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, USA, 2019.

D. Arp et al., "Dos and Don’ts of Machine Learning in Computer Security," in 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 2022.

R. Jordaney, K. Sharad, S. K. Dash, Z. Wang, D. Papini, I. Nouretdinov, and L. Cavallaro, "Transcend: Detecting Concept Drift in Malware Classification Models," in 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, Canada, 2017.

S. M. Lundberg and S.-I. Lee, "A Unified Approach to Interpreting Model Predictions," in Advances in Neural Information Processing Systems, 2017, vol. 30, pp. 4756–4774.

M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, Aug. 2016.

J. Kumar and G. Ranganathan, "Malware Attack Detection in Large Scale Networks using the Ensemble Deep Restricted Boltzmann Machine," Engineering, Technology & Applied Science Research, vol. 13, no. 5, pp. 11773–11778, Oct. 2023.