A Comparative Analysis of Machine Learning Techniques for URL Phishing Detection
Received: 5 September 2024 | Revised: 9 October 2024 and 14 October 2024 | Accepted: 16 October 2024 | Online: 29 October 2024
Corresponding author: Mohamed M. Dessouky
Abstract
The growing threat of URL phishing attacks raises the need for advanced detection systems to protect digital environments. This paper explores the effectiveness of various machine learning models in classifying URLs as phishing or benign, focusing on the random forest model. Using ensemble learning, the random forest demonstrated superior accuracy and reliability compared to traditional methods, achieving consistent performance with accuracy rates between 99.93% and 99.98%. The model's performance was evaluated daily over eight days, highlighting its robustness in handling real-world scenarios. This study utilized GridSearchCV to optimize model hyperparameters, enhancing model robustness and minimizing overfitting. Future research directions include advanced feature engineering, deep learning techniques, and multimodal data integration to further improve phishing detection systems.
Keywords:
phishing attacks, phishing detection, ensemble learning, random forestDownloads
References
A. Butnaru, A. Mylonas, and N. Pitropakis, "Towards Lightweight URL-Based Phishing Detection," Future Internet, vol. 13, no. 6, Jun. 2021, Art. no. 154.
S. Srivastava and S. K. Gupta, "Phishing Detection Techniques: A Comparative Study," in 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, Sep. 2021, pp. 1–6.
F. Tchakounte, V. S. Nyassi, D. E. H. Danga, K. P. Udagepola, and M. Atemkeng, "A Game Theoretical Model for Anticipating Email Spear-Phishing Strategies," EAI Endorsed Transactions on Scalable Information Systems, vol. 8, no. 30, 2021.
M. F. Alghenaim, N. A. A. Bakar, F. Abdul Rahim, V. Z. Vanduhe, and G. Alkawsi, "Phishing Attack Types and Mitigation: A Survey," in Data Science and Emerging Technologies, Khulna, Bangladesh, 2023, pp. 131–153.
C. Balim and E. S. Gunal, "Automatic Detection of Smishing Attacks by Machine Learning Methods," in 2019 1st International Informatics and Software Engineering Conference (UBMYK), Ankara, Turkey, Nov. 2019, pp. 1–3.
S. Bell and P. Komisarczuk, "An Analysis of Phishing Blacklists: Google Safe Browsing, OpenPhish, and PhishTank," in Proceedings of the Australasian Computer Science Week Multiconference, Melbourne, Australia, Feb. 2020, pp. 1–11.
A. K. Singh, "Malicious and Benign Webpages Dataset," Data in Brief, vol. 32, Oct. 2020, Art. no. 106304.
S. Madakam, R. Ramaswamy, and S. Tripathi, "Internet of Things (IoT): A Literature Review," Journal of Computer and Communications, vol. 3, no. 5, pp. 164–173, May 2015.
A. Hannousse, "Web page phishing detection." Mendeley, Jun. 25, 2021.
Downloads
How to Cite
License
Copyright (c) 2024 Adel Ataih Albishri, Mohamed M. Dessouky
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.