A Semantic-Aware Approach to Phishing URL Detection

Trong Thua Huynh; Hoang Thanh Nguyen; Nhat Huynh Tran

doi:10.48084/etasr.13187

Authors

Trong Thua Huynh Information Security Technology Lab, Posts and Telecommunications Institute of Technology, Vietnam https://orcid.org/0000-0003-3934-1067
Hoang Thanh Nguyen Information Security Technology Lab, Posts and Telecommunications Institute of Technology, Vietnam
Nhat Huynh Tran Faculty of Information Technology, Posts and Telecommunications Institute of Technology, Vietnam

Volume: 15 | Issue: 6 | Pages: 29866-29871 | December 2025 | https://doi.org/10.48084/etasr.13187

Received: 7 July 2025 | Revised: 23 August 2025 | Accepted: 2 September 2025 | Online: 8 December 2025

Corresponding author: Hoang Thanh Nguyen

Abstract

With the rapid growth of online transactions, phishing attacks have become increasingly unpredictable, particularly those involving malicious Uniform Resource Locators (URLs). Currently, few studies have effectively combined manual and semantic features within a unified framework to both fully utilize the structural information and capture the deep context dependence of the character string. To address this gap, this study proposes a deep attention-based approach for phishing URL detection. Our approach first relies on the importance of manual features and the Bidirectional Encoder Representations from Transformers (BERT) model to extract semantic feature vectors. Subsequently, a hybrid deep learning architecture comprising Bidirectional Long Short-Term Memory (BiLSTM) layers, an attention mechanism, and fully connected dense networks is employed to classify URLs as either phishing or legitimate. Experimental results demonstrate that our proposed model achieves a high classification accuracy of 96.77%, while ablation analysis highlights the individual contributions of key components, including BERT embeddings, attention mechanism, LSTM layers, and feature types, to overall model performance. Finally, by training on a Kaggle benchmark dataset and testing on real-world phishing samples, the study confirms the model's strong generalization capability in detecting emerging phishing threats.

Keywords:

phishing Uniform Resource Locators (URL), Bidirectional Long Short-Term Memory (BiLSTM), attention mechanism, Bidirectional Encoder Representations from Transformers (BERT), semantic feature

Downloads

Download data is not yet available.

References

S. Kavya and D. Sumathi, "Staying ahead of phishers: a review of recent advances and emerging methodologies in phishing detection," Artificial Intelligence Review, vol. 58, no. 2, Dec. 2024, Art. no. 50. DOI: https://doi.org/10.1007/s10462-024-11055-z

H. Le, Q. Pham, D. Sahoo, and S. C. H. Hoi, "URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection." arXiv, 2018.

A. Abuadbba et al., "Towards Web Phishing Detection Limitations and Mitigation." arXiv, 2022.

S. A. Murad, N. Rahimi, and A. J. Md Muzahid, "PhishGuard: Machine Learning-Powered Phishing URL Detection," in 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE), Las Vegas, NV, USA, Jul. 2023, pp. 2279–2284. DOI: https://doi.org/10.1109/CSCE60160.2023.00371

R. Ferdaws, "Machine Learning and Deep Learning for Phishing Site URL Classification," Master of Science, California State University San Marcos, San Marcos, CA, USA, 2024.

A. A. Albishri and M. M. Dessouky, "A Comparative Analysis of Machine Learning Techniques for URL Phishing Detection," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 18495–18501, Dec. 2024. DOI: https://doi.org/10.48084/etasr.8920

D. Kalla and S. Kuraku, "Phishing Website URL’s Detection Using NLP and Machine Learning Techniques," Journal on Artificial Intelligence, vol. 5, pp. 145–162, 2023. DOI: https://doi.org/10.32604/jai.2023.043366

K. S. Jishnu and B. Arthi, "Phishing URL Detection Using BiLSTM With Attention Mechanism," in Machine Intelligence Applications in Cyber-Risk Management, M. A. Almaiah and Y. Maleh, Eds. IGI Global, 2024, pp. 159–184. DOI: https://doi.org/10.4018/979-8-3693-7540-2.ch008

M. Elsadig et al., "Intelligent Deep Machine Learning Cyber Phishing URL Detection Based on BERT Features Extraction," Electronics, vol. 11, no. 22, Nov. 2022, Art. no. 3647. DOI: https://doi.org/10.3390/electronics11223647

P. H. Hussan and S. M. Mangj, "BERTPHIURL: A Teacher-Student Learning Approach Using DistilRoBERTa and RoBERTa for Detecting Phishing Cyber URLs," Journal of Future Artificial Intelligence and Technologies, vol. 1, no. 4, pp. 417–428, Feb. 2025. DOI: https://doi.org/10.62411/faith.3048-3719-71

Phishing Site URLs. (2020), T. Tiwari. [Online]. Available: https://www.kaggle.com/datasets/taruntiwarihp/phishing-site-urls.

M. A. Tamal, M. K. Islam, T. Bhuiyan, and A. Sattar, "Dataset of suspicious phishing URL detection," Frontiers in Computer Science, vol. 6, Mar. 2024, Art. no. 1308634. DOI: https://doi.org/10.3389/fcomp.2024.1308634

N. Reimers and I. Gurevych, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." arXiv, 2019. DOI: https://doi.org/10.18653/v1/D19-1410

M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, Nov. 1997. DOI: https://doi.org/10.1109/78.650093

D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate." arXiv, 2014.

I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. Cambridge, Massachusetts: The MIT Press, 2016.

Malicious URLs dataset. (2021), M. Siddhartha. [Online]. Available: https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset.

J. K. S. Kaitholikkal, "Phishing URL dataset." Mendeley Data, Apr. 2024.

Benign and Malicious URLs. (2022), S. Malibari. [Online]. Available: https://www.kaggle.com/datasets/samahsadiq/benign-and-malicious-urls.

A Semantic-Aware Approach to Phishing URL Detection

Authors

Abstract

Keywords:

Downloads

References

Downloads

How to Cite

Metrics

License

Download the latest version of our template (March 13, 2026)