Advancing Email Spam Classification using Machine Learning and Deep Learning Techniques
Received: 25 April 2024 | Revised: 13 May 2024 | Accepted: 17 May 2024 | Online: 2 August 2024
Corresponding author: Mohammed A. Aleisa
Abstract
Email communication has become integral to various industries, but the pervasive issue of spam emails poses significant challenges for service providers. This research proposes a study leveraging Machine Learning (ML) and Deep Learning (DL) techniques to effectively classify spam emails. Methods such as Logistic Regression (LR), Naïve Bayes (NB), Random Forest (RF), and Artificial Neural Networks (ANNs) are employed to construct robust models for accurate spam detection. By amalgamating these techniques, the aim is to enhance efficiency and precision in spam detection, aiding email and IoT service providers in mitigating the detrimental effects of spam. Evaluation of the proposed models revealed promising outcomes. LR, RF, and NB achieved an impressive accuracy of 97% and an F1-Score of 97.5%, showcasing their efficacy in accurately identifying spam emails. The ANN model demonstrated slightly superior performance, with 98% accuracy and 97.5% F1-score, suggesting potential improvements in accuracy and robustness in spam filtering systems. These findings underscore the viability of both traditional ML algorithms and DL approaches in addressing the challenges of email spam classification, paving the way for more effective spam detection mechanisms in electronic communication platforms.
Keywords:
spam, ML, DL, spam classification, emailDownloads
References
S. L. Pfleeger and G. Bloom, "Canning SPAM: Proposed solutions to unwanted email," IEEE Security & Privacy, vol. 3, no. 2, pp. 40–47, Mar. 2005.
C. Grier, K. Thomas, V. Paxson, and M. Zhang, "@spam: the underground on 140 characters or less," in 17th ACM Conference on Computer and Communications Security, Chicago, IL, USA, Oct. 2010, pp. 27–37.
D. Kumar and R. Kumar, "Spam Filtering using SVM with different Kernel Functions," International Journal of Computer Applications, vol. 136, no. 5, pp. 16–23, Feb. 2016.
R. Heartfield and G. Loukas, "A Taxonomy of Attacks and a Survey of Defence Mechanisms for Semantic Social Engineering Attacks," ACM Computing Surveys, vol. 48, no. 3, Sep. 2015, Art. no. 37.
J. John, A. Moshchuk, S. Gribble, and A. Krishnamurthy, "Studying Spamming Botnets Using Botlab," in Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, Boston, MA, USA, Jan. 2009, pp. 291–306.
N. Kumar, S. Sonowal, and Nishant, "Email Spam Detection Using Machine Learning Algorithms," in Second International Conference on Inventive Research in Computing Applications, Coimbatore, India, Jul. 2020, pp. 108–113.
A. Junnarkar, S. Adhikari, J. Fagania, P. Chimurkar, and D. Karia, "E-Mail Spam Classification via Machine Learning and Natural Language Processing," in Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks, Tirunelveli, India, Feb. 2021, pp. 693–699.
W. A. Awad and S. M. ELseuofi, "Machine Learning Methods for Spam E-Mail Classification," International Journal of Computer Science and Information Technology, vol. 3, no. 1, pp. 173–184, Feb. 2011.
F. Zhang, P. P. K. Chan, B. Biggio, D. S. Yeung, and F. Roli, "Adversarial Feature Selection Against Evasion Attacks," IEEE Transactions on Cybernetics, vol. 46, no. 3, pp. 766–777, Mar. 2016.
K. Shaukat, S. Luo, S. Chen, and D. Liu, "Cyber Threat Detection Using Machine Learning Techniques: A Performance Evaluation Perspective," in International Conference on Cyber Warfare and Security, Islamabad, Pakistan, Oct. 2020, pp. 1–6.
A. Garavand, C. Salehnasab, A. Behmanesh, N. Aslani, A. H. Zadeh, and M. Ghaderzadeh, "Efficient Model for Coronary Artery Disease Diagnosis: A Comparative Study of Several Machine Learning Algorithms," Journal of Healthcare Engineering, vol. 2022, Oct. 2022, Art. no. 5359540.
M. Ghaderzadeh, M. Aria, and F. Asadi, "X-Ray Equipped with Artificial Intelligence: Changing the COVID-19 Diagnostic Paradigm during the Pandemic," BioMed Research International, vol. 2021, Aug. 2021, Art. no. e9942873. Hajek, A. Barushka, and M. Munk, "Fake consumer review detection using deep neural networks integrating word embeddings and emotion mining," Neural Computing and Applications, vol. 32, no. 23, pp. 17259–17274, Dec. 2020.
V. Ramanathan and H. Wechsler, "Phishing detection and impersonated entity discovery using Conditional Random Field and Latent Dirichlet Allocation," Computers & Security, vol. 34, pp. 123–139, May 2013.
A. Ghourabi, M. A. Mahmood, and Q. M. Alzubi, "A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and English Messages," Future Internet, vol. 12, no. 9, Sep. 2020, Art. no. 156.
M. V. Madhavan, S. Pande, P. Umekar, T. Mahore, and D. Kalyankar, "Comparative Analysis of Detection of Email Spam With the Aid of Machine Learning Approaches," IOP Conference Series: Materials Science and Engineering, vol. 1022, no. 1, Jan. 2021, Art. no. 012113.
A. Rayan, "Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique," Computational Intelligence and Neuroscience, vol. 2022, Aug. 2022, Art. no. e2500772.
A. K. Suborna, S. Saha, C. Roy, S. Sarkar, and Md. T. H. Siddique, "An Approach to Improve the Accuracy of Detecting Spam in Online Reviews," in International Conference on Information and Communication Technology for Sustainable Development, Dhaka, Bangladesh, Feb. 2021, pp. 296–299.
I. Frias-Blanco, A. Verdecia-Cabrera, A. Ortiz-Diaz, and A. Carvalho, "Fast adaptive stacking of ensembles," in 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, Apr. 2016, pp. 929–934.
M. Abd El-Kareem, A. Elshenawy, and F. Elrfaey, "Mail spam detection using stacking classification," Journal of Al-Azhar University Engineering Sector, vol. 12, no. 45, pp. 1242–1255, Oct. 2017.
S. Madichetty, "A stacked convolutional neural network for detecting the resource tweets during a disaster," Multimedia Tools and Applications, vol. 80, no. 3, pp. 3927–3949, Jan. 2021.
M. Anwer, S. M. Khan, M. U. Farooq, and Waseemullah, "Attack Detection in IoT using Machine Learning," Engineering, Technology & Applied Science Research, vol. 11, no. 3, pp. 7273–7278, Jun. 2021.
V. C. Ho, T. H. Nguyen, T. Q. Nguyen, and D. D. Nguyen, "Application of Neural Networks for the Estimation of the Shear Strength of Circular RC Columns," Engineering, Technology & Applied Science Research, vol. 12, no. 6, pp. 9409–9413, Dec. 2022.
H. Oh, "A YouTube Spam Comments Detection Scheme Using Cascaded Ensemble Machine Learning Model," IEEE Access, vol. 9, pp. 144121–144128, 2021.
C. Zhao, Y. Xin, X. Li, Y. Yang, and Y. Chen, "A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data," Applied Sciences, vol. 10, no. 3, Jan. 2020, Art. no. 936.
S. Liu, Y. Wang, J. Zhang, C. Chen, and Y. Xiang, "Addressing the class imbalance problem in Twitter spam detection using ensemble learning," Computers & Security, vol. 69, pp. 35–49, Aug. 2017.
T. O. Omotehinwa and D. O. Oyewola, "Hyperparameter Optimization of Ensemble Models for Spam Email Detection," Applied Sciences, vol. 13, no. 3, Jan. 2023, Art. no. 1971.
K. Sahu, F. A. Alzahrani, R. K. Srivastava, and R. Kumar, "Evaluating the impact of prediction techniques: Software reliability perspective," Computers, Materials and Continua, vol. 67, no. 2, pp. 1471–1488, 2021.
"2007 TREC Public Spam Corpus." [Online]. Available: https://plg.uwaterloo.ca/~gvcormac/treccorpus07/.
"The Enron-Spam datasets." https://www2.aueb.gr/users/ion/data/enron-spam/.
K. Sahu and R. K. Srivastava, "Needs and Importance of Reliability Prediction: An Industrial Perspective," Information Sciences Letters, vol. 9, no. 1, pp. 33–37, Mar. 2020.
M. A. Haq, "Smotednn: A novel model for air pollution forecasting and aqi classification," Computers, Materials and Continua, vol. 71, no. 1, pp. 1403–1425, 2022.
M. A. Haq, M. A. R. Khan, and M. Alshehri, "Insider Threat Detection Based on NLP Word Embedding and Machine Learning," Intelligent Automation and Soft Computing, vol. 33, no. 1, pp. 619–635, 2022.
M. Z. Gashti, "Detection of Spam Email by Combining Harmony Search Algorithm and Decision Tree," Engineering, Technology & Applied Science Research, vol. 7, no. 3, pp. 1713–1718, Jun. 2017.
M. Madhukar and S. Verma, "Hybrid Semantic Analysis of Tweets: A Case Study of Tweets on Girl-Child in India," Engineering, Technology & Applied Science Research, vol. 7, no. 5, pp. 2014–2016, Oct. 2017.
M. A. Haq, M. A. R. Khan, and T. AL-Harbi, "Development of pccnn-based network intrusion detection system for edge computing," Computers, Materials and Continua, vol. 71, no. 1, pp. 1769–1788, 2022.
M. A. Haq, "DBoTPM: A Deep Neural Network-Based Botnet Prediction Model," Electronics, vol. 12, no. 5, Jan. 2023, Art. no. 1159.
M. A. Haq and M. A. R. Khan, "Dnnbot: Deep neural network-based botnet detection and classification," Computers, Materials and Continua, vol. 71, no. 1, pp. 1729–1750, 2022.
M. A. Haq, "CDLSTM: A novel model for climate change forecasting," Computers, Materials and Continua, vol. 71, no. 2, pp. 2363–2381, 2022.
M. A. Haq, A. K. Jilani, and P. Prabu, "Deep learning based modeling of groundwater storage change," Computers, Materials and Continua, vol. 70, no. 3, pp. 4599–4617, 2022.
M. A. Haq et al., "Analysis of environmental factors using AI and ML methods," Scientific Reports, vol. 12, no. 1, Aug. 2022, Art. no. 13267.
Downloads
How to Cite
License
Copyright (c) 2024 Meaad Hamad Alsuwit, Mohd Anul Haq, Mohammed A. Aleisa
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.