Explainable AI-based Framework for Efficient Detection of Spam from Text using an Enhanced Ensemble Technique
Received: 21 May 2024 | Revised: 30 May 2024 | Accepted: 6 June 2024 | Online: 14 June 2024
Corresponding author: Ahmed Alzahrani
Abstract
Today, identifying and preventing spam has become a challenge, particularly with the abundance of text-based content in emails, social media platforms, and websites. Although traditional spam filters are somewhat effective, they often struggle to keep up with new spam methods. The introduction of Machine Learning (ML) and Deep Learning (DL) models has greatly improved the capabilities of spam detection systems. However, the black-box nature of these models poses challenges to user trust due to their lack of transparency. To address this issue, Explainable AI (XAI) has emerged, aiming to make AI decisions more understandable to humans. This study combines XAI with ensemble learning, utilizing multiple learning algorithms to improve performance, and proposes a robust and interpretable system to detect spam effectively. Four classifiers were used for training and testing: Support Vector Machine (SVM), Logistic Regression (LR), Gradient Boost (GB), and Decision Tree (DT). To reduce overfitting, two independent spam email datasets were blended and balanced. The stacking ensemble technique, based on Random Forest (RF), was the best-performing model compared to individual classifiers, having 98% recall, 96% precision, and 97% F1-score. By leveraging XAI's interpretability, the model elucidates the reasoning behind its classifications, leading to the comprehension of hidden patterns associated with spam detection.
Keywords:
machine learning, weak learner, strong learner, spam prediction, ensemble techniques, stacking classifier, explainable AIDownloads
References
A. Ibrahim, M. Mejri, and F. Jaafar, "An Explainable Artificial Intelligence Approach for a Trustworthy Spam Detection," in 2023 IEEE International Conference on Cyber Security and Resilience (CSR), Venice, Italy, Jul. 2023, pp. 160–167.
Z. Zhang, E. Damiani, H. A. Hamadi, C. Y. Yeun, and F. Taher, "Explainable Artificial Intelligence to Detect Image Spam Using Convolutional Neural Network," in 2022 International Conference on Cyber Resilience (ICCR), Dubai, United Arab Emirates, Oct. 2022, pp. 1–5.
Z. Zhang, H. A. Hamadi, E. Damiani, C. Y. Yeun, and F. Taher, "Explainable Artificial Intelligence Applications in Cyber Security: State-of-the-Art in Research," IEEE Access, vol. 10, pp. 93104–93139, 2022.
T. Wu, S. Wen, Y. Xiang, and W. Zhou, "Twitter spam detection: Survey of new approaches and comparative study," Computers & Security, vol. 76, pp. 265–284, Jul. 2018.
M. Z. Asghar, A. Ullah, S. Ahmad, and A. Khan, "Opinion spam detection framework using hybrid classification scheme," Soft Computing, vol. 24, no. 5, pp. 3475–3498, Mar. 2020.
H. Khan, M. U. Asghar, M. Z. Asghar, G. Srivastava, P. K. R. Maddikunta, and T. R. Gadekallu, "Fake Review Classification Using Supervised Machine Learning," in Pattern Recognition. ICPR International Workshops and Challenges, 2021, pp. 269–288.
M. A. Abid, S. Ullah, M. A. Siddique, M. F. Mushtaq, W. Aljedaani, and F. Rustam, "Spam SMS filtering based on text features and supervised machine learning techniques," Multimedia Tools and Applications, vol. 81, no. 28, pp. 39853–39871, Nov. 2022.
Y. Guo, Z. Mustafaoglu, and D. Koundal, "Spam Detection Using Bidirectional Transformers and Machine Learning Classifier Algorithms," Journal of Computational and Cognitive Engineering, vol. 2, no. 1, pp. 5–9, 2023.
P. Malhotra and S. Malik, "Spam Email Detection Using Machine Learning and Deep Learning Techniques," in Proceedings of the International Conference on Innovative Computing & Communication (ICICC) 2022, 2022.
A. Sheneamer, "Comparison of Deep and Traditional Learning Methods for Email Spam Filtering," International Journal of Advanced Computer Science and Applications, vol. 12, no. 1, 2021.
O. Abayomi-Alli, S. Misra, and A. Abayomi-Alli, "A deep learning method for automatic SMS spam classification: Performance of learning algorithms on indigenous dataset," Concurrency and Computation: Practice and Experience, vol. 34, no. 17, 2022, Art. no. e6989.
K. Debnath and N. Kar, "Email Spam Detection using Deep Learning Approach," in 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON), Faridabad, India, May 2022, vol. 1, pp. 37–41.
U. A. Butt, R. Amin, H. Aldabbas, S. Mohan, B. Alouffi, and A. Ahmadian, "Cloud-based email phishing attack using machine and deep learning algorithm," Complex & Intelligent Systems, vol. 9, no. 3, pp. 3043–3070, Jun. 2023.
V. Gupta, A. Mehta, A. Goel, U. Dixit, and A. C. Pandey, "Spam Detection Using Ensemble Learning," in Harmony Search and Nature Inspired Optimization Algorithms, 2019, pp. 661–668.
M. Adnan, M. O. Imam, M. F. Javed, and I. Murtza, "Improving spam email classification accuracy using ensemble techniques: a stacking approach," International Journal of Information Security, vol. 23, no. 1, pp. 505–517, Feb. 2024.
P. Bountakas and C. Xenakis, "HELPHED: Hybrid Ensemble Learning PHishing Email Detection," Journal of Network and Computer Applications, vol. 210, Jan. 2023, Art. no. 103545.
"SMS Spam Collection Dataset." [Online]. Available: https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset.
"Webspam-UK2007". [Online]. Available: https://chato.cl/webspam/datasets/uk2007/
A. S. Khan, H. Ahmad, M. Zubair, F. Khan, A. Arif, and H. Ali, "Personality Classification from Online Text using Machine Learning Approach," International Journal of Advanced Computer Science and Applications, vol. 11, no. 3, pp. 460–476, 2020.
A. Alhogail and A. Alsabih, "Applying machine learning and natural language processing to detect phishing email," Computers & Security, vol. 110, Nov. 2021, Art. no. 102414.
M. Z. Asghar, A. Khan, S. R. Zahra, S. Ahmad, and F. M. Kundi, "Aspect-based opinion mining framework using heuristic patterns," Cluster Computing, vol. 22, no. 3, pp. 7181–7199, May 2019.
U. A. Mohammed and M. Sanusi, "An Optimized Phising Email Detection and Prevention Using Classification Models," International Journal of Engineering Applied Sciences and Technology, vol. 7, no. 10, pp. 9–21, Feb. 2023.
A. Alzahrani and M. Z. Asghar, "Cyber vulnerabilities detection system in logistics-based IoT data exchange," Egyptian Informatics Journal, vol. 25, Mar. 2024, Art. no. 100448.
A. Alzahrani, "Digital Image Forensics: An Improved DenseNet Architecture for Forged Image Detection," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13671–13680, Apr. 2024.
M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and J. Lloret, "Network Traffic Classifier With Convolutional and Recurrent Neural Networks for Internet of Things," IEEE Access, vol. 5, pp. 18042–18050, 2017.
K. Roshan and A. Zafar, "Utilizing XAI technique to improve autoencoder based model for computer network anomaly detection with shapley additive explanation(SHAP)," International journal of Computer Networks & Communications, vol. 13, no. 6, pp. 109–128, Sep. 2021.
K. Roshan and A. Zafar, "Using Kernel SHAP XAI Method to Optimize the Network Anomaly Detection Model," in 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, Mar. 2022, pp. 74–80.
Downloads
How to Cite
License
Copyright (c) 2024 Ahmed Alzahrani
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.