A Semantic-Aware Approach to Phishing URL Detection
Received: 7 July 2025 | Revised: 23 August 2025 | Accepted: 2 September 2025 | Online: 8 December 2025
Corresponding author: Hoang Thanh Nguyen
Abstract
With the rapid growth of online transactions, phishing attacks have become increasingly unpredictable, particularly those involving malicious Uniform Resource Locators (URLs). Currently, few studies have effectively combined manual and semantic features within a unified framework to both fully utilize the structural information and capture the deep context dependence of the character string. To address this gap, this study proposes a deep attention-based approach for phishing URL detection. Our approach first relies on the importance of manual features and the Bidirectional Encoder Representations from Transformers (BERT) model to extract semantic feature vectors. Subsequently, a hybrid deep learning architecture comprising Bidirectional Long Short-Term Memory (BiLSTM) layers, an attention mechanism, and fully connected dense networks is employed to classify URLs as either phishing or legitimate. Experimental results demonstrate that our proposed model achieves a high classification accuracy of 96.77%, while ablation analysis highlights the individual contributions of key components, including BERT embeddings, attention mechanism, LSTM layers, and feature types, to overall model performance. Finally, by training on a Kaggle benchmark dataset and testing on real-world phishing samples, the study confirms the model's strong generalization capability in detecting emerging phishing threats.
Keywords:
phishing Uniform Resource Locators (URL), Bidirectional Long Short-Term Memory (BiLSTM), attention mechanism, Bidirectional Encoder Representations from Transformers (BERT), semantic featureDownloads
References
S. Kavya and D. Sumathi, "Staying ahead of phishers: a review of recent advances and emerging methodologies in phishing detection," Artificial Intelligence Review, vol. 58, no. 2, Dec. 2024, Art. no. 50. DOI: https://doi.org/10.1007/s10462-024-11055-z
H. Le, Q. Pham, D. Sahoo, and S. C. H. Hoi, "URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection." arXiv, 2018.
A. Abuadbba et al., "Towards Web Phishing Detection Limitations and Mitigation." arXiv, 2022.
S. A. Murad, N. Rahimi, and A. J. Md Muzahid, "PhishGuard: Machine Learning-Powered Phishing URL Detection," in 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE), Las Vegas, NV, USA, Jul. 2023, pp. 2279–2284. DOI: https://doi.org/10.1109/CSCE60160.2023.00371
R. Ferdaws, "Machine Learning and Deep Learning for Phishing Site URL Classification," Master of Science, California State University San Marcos, San Marcos, CA, USA, 2024.
A. A. Albishri and M. M. Dessouky, "A Comparative Analysis of Machine Learning Techniques for URL Phishing Detection," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 18495–18501, Dec. 2024. DOI: https://doi.org/10.48084/etasr.8920
D. Kalla and S. Kuraku, "Phishing Website URL’s Detection Using NLP and Machine Learning Techniques," Journal on Artificial Intelligence, vol. 5, pp. 145–162, 2023. DOI: https://doi.org/10.32604/jai.2023.043366
K. S. Jishnu and B. Arthi, "Phishing URL Detection Using BiLSTM With Attention Mechanism," in Machine Intelligence Applications in Cyber-Risk Management, M. A. Almaiah and Y. Maleh, Eds. IGI Global, 2024, pp. 159–184. DOI: https://doi.org/10.4018/979-8-3693-7540-2.ch008
M. Elsadig et al., "Intelligent Deep Machine Learning Cyber Phishing URL Detection Based on BERT Features Extraction," Electronics, vol. 11, no. 22, Nov. 2022, Art. no. 3647. DOI: https://doi.org/10.3390/electronics11223647
P. H. Hussan and S. M. Mangj, "BERTPHIURL: A Teacher-Student Learning Approach Using DistilRoBERTa and RoBERTa for Detecting Phishing Cyber URLs," Journal of Future Artificial Intelligence and Technologies, vol. 1, no. 4, pp. 417–428, Feb. 2025. DOI: https://doi.org/10.62411/faith.3048-3719-71
Phishing Site URLs. (2020), T. Tiwari. [Online]. Available: https://www.kaggle.com/datasets/taruntiwarihp/phishing-site-urls.
M. A. Tamal, M. K. Islam, T. Bhuiyan, and A. Sattar, "Dataset of suspicious phishing URL detection," Frontiers in Computer Science, vol. 6, Mar. 2024, Art. no. 1308634. DOI: https://doi.org/10.3389/fcomp.2024.1308634
N. Reimers and I. Gurevych, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." arXiv, 2019. DOI: https://doi.org/10.18653/v1/D19-1410
M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, Nov. 1997. DOI: https://doi.org/10.1109/78.650093
D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate." arXiv, 2014.
I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. Cambridge, Massachusetts: The MIT Press, 2016.
Malicious URLs dataset. (2021), M. Siddhartha. [Online]. Available: https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset.
J. K. S. Kaitholikkal, "Phishing URL dataset." Mendeley Data, Apr. 2024.
Benign and Malicious URLs. (2022), S. Malibari. [Online]. Available: https://www.kaggle.com/datasets/samahsadiq/benign-and-malicious-urls.
Downloads
How to Cite
License
Copyright (c) 2025 Trong Thua Huynh, Hoang Thanh Nguyen, Nhat Huynh Tran

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
