A Semantic-Aware Approach to Phishing URL Detection

Authors

  • Trong Thua Huynh Information Security Technology Lab, Posts and Telecommunications Institute of Technology, Vietnam https://orcid.org/0000-0003-3934-1067
  • Hoang Thanh Nguyen Information Security Technology Lab, Posts and Telecommunications Institute of Technology, Vietnam
  • Nhat Huynh Tran Faculty of Information Technology, Posts and Telecommunications Institute of Technology, Vietnam
Volume: 15 | Issue: 6 | Pages: 29866-29871 | December 2025 | https://doi.org/10.48084/etasr.13187

Abstract

With the rapid growth of online transactions, phishing attacks have become increasingly unpredictable, particularly those involving malicious Uniform Resource Locators (URLs). Currently, few studies have effectively combined manual and semantic features within a unified framework to both fully utilize the structural information and capture the deep context dependence of the character string. To address this gap, this study proposes a deep attention-based approach for phishing URL detection. Our approach first relies on the importance of manual features and the Bidirectional Encoder Representations from Transformers (BERT) model to extract semantic feature vectors. Subsequently, a hybrid deep learning architecture comprising Bidirectional Long Short-Term Memory (BiLSTM) layers, an attention mechanism, and fully connected dense networks is employed to classify URLs as either phishing or legitimate. Experimental results demonstrate that our proposed model achieves a high classification accuracy of 96.77%, while ablation analysis highlights the individual contributions of key components, including BERT embeddings, attention mechanism, LSTM layers, and feature types, to overall model performance. Finally, by training on a Kaggle benchmark dataset and testing on real-world phishing samples, the study confirms the model's strong generalization capability in detecting emerging phishing threats.

Keywords:

phishing Uniform Resource Locators (URL), Bidirectional Long Short-Term Memory (BiLSTM), attention mechanism, Bidirectional Encoder Representations from Transformers (BERT), semantic feature

Downloads

Download data is not yet available.

References

S. Kavya and D. Sumathi, "Staying ahead of phishers: a review of recent advances and emerging methodologies in phishing detection," Artificial Intelligence Review, vol. 58, no. 2, Dec. 2024, Art. no. 50. DOI: https://doi.org/10.1007/s10462-024-11055-z

H. Le, Q. Pham, D. Sahoo, and S. C. H. Hoi, "URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection." arXiv, 2018.

A. Abuadbba et al., "Towards Web Phishing Detection Limitations and Mitigation." arXiv, 2022.

S. A. Murad, N. Rahimi, and A. J. Md Muzahid, "PhishGuard: Machine Learning-Powered Phishing URL Detection," in 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE), Las Vegas, NV, USA, Jul. 2023, pp. 2279–2284. DOI: https://doi.org/10.1109/CSCE60160.2023.00371

R. Ferdaws, "Machine Learning and Deep Learning for Phishing Site URL Classification," Master of Science, California State University San Marcos, San Marcos, CA, USA, 2024.

A. A. Albishri and M. M. Dessouky, "A Comparative Analysis of Machine Learning Techniques for URL Phishing Detection," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 18495–18501, Dec. 2024. DOI: https://doi.org/10.48084/etasr.8920

D. Kalla and S. Kuraku, "Phishing Website URL’s Detection Using NLP and Machine Learning Techniques," Journal on Artificial Intelligence, vol. 5, pp. 145–162, 2023. DOI: https://doi.org/10.32604/jai.2023.043366

K. S. Jishnu and B. Arthi, "Phishing URL Detection Using BiLSTM With Attention Mechanism," in Machine Intelligence Applications in Cyber-Risk Management, M. A. Almaiah and Y. Maleh, Eds. IGI Global, 2024, pp. 159–184. DOI: https://doi.org/10.4018/979-8-3693-7540-2.ch008

M. Elsadig et al., "Intelligent Deep Machine Learning Cyber Phishing URL Detection Based on BERT Features Extraction," Electronics, vol. 11, no. 22, Nov. 2022, Art. no. 3647. DOI: https://doi.org/10.3390/electronics11223647

P. H. Hussan and S. M. Mangj, "BERTPHIURL: A Teacher-Student Learning Approach Using DistilRoBERTa and RoBERTa for Detecting Phishing Cyber URLs," Journal of Future Artificial Intelligence and Technologies, vol. 1, no. 4, pp. 417–428, Feb. 2025. DOI: https://doi.org/10.62411/faith.3048-3719-71

Phishing Site URLs. (2020), T. Tiwari. [Online]. Available: https://www.kaggle.com/datasets/taruntiwarihp/phishing-site-urls.

M. A. Tamal, M. K. Islam, T. Bhuiyan, and A. Sattar, "Dataset of suspicious phishing URL detection," Frontiers in Computer Science, vol. 6, Mar. 2024, Art. no. 1308634. DOI: https://doi.org/10.3389/fcomp.2024.1308634

N. Reimers and I. Gurevych, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." arXiv, 2019. DOI: https://doi.org/10.18653/v1/D19-1410

M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, Nov. 1997. DOI: https://doi.org/10.1109/78.650093

D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate." arXiv, 2014.

I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. Cambridge, Massachusetts: The MIT Press, 2016.

Malicious URLs dataset. (2021), M. Siddhartha. [Online]. Available: https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset.

J. K. S. Kaitholikkal, "Phishing URL dataset." Mendeley Data, Apr. 2024.

Benign and Malicious URLs. (2022), S. Malibari. [Online]. Available: https://www.kaggle.com/datasets/samahsadiq/benign-and-malicious-urls.

Downloads

How to Cite

[1]
T. T. Huynh, H. T. Nguyen, and N. H. Tran, “A Semantic-Aware Approach to Phishing URL Detection”, Eng. Technol. Appl. Sci. Res., vol. 15, no. 6, pp. 29866–29871, Dec. 2025.

Metrics

Abstract Views: 422
PDF Downloads: 268

Metrics Information