Enhancing Arabic Speaker Recognition with ECAPA-TDNN
Received: 6 August 2025 | Revised: 15 September 2025, 27 October 2025, 8 March 2026, 29 March 2026, 30 March 2026, and 2 April 2026 | Accepted: 3 April 2026 | Online: 6 June 2026
Corresponding author: Mahmoud Ayman
Abstract
This paper presents a fine-tuned Emphasized Channel Attention, Propagation and Aggregation - Time Delay Neural Network (ECAPA-TDNN) model for Arabic speaker recognition, with a focus on enhancing performance in noisy environments. The model was trained on the Voice of Celebrities 1 (VoxCeleb1) and VoxCeleb2 corpora combined with Arabic data from the Qatar Computing Research Institute (QCRI) Aljazeera Speech Resource (QASR), and was evaluated on the VoxCeleb1 test protocol (Vox1-O), the Arab Celebrity (ArabCeleb) dataset, a held-out QASR test split, and an in-house Arabic dataset of authentic recordings. Through targeted fine-tuning and data augmentation techniques, the proposed approach reduces the Equal Error Rate (EER) on Arabic datasets and improves robustness to noise, while maintaining satisfactory performance on English datasets. These findings indicate that careful adaptation can support the development of more balanced multilingual speaker verification systems, particularly for underrepresented languages such as Arabic.
Keywords:
ECAPA-TDNN, speaker verification, speaker embeddings, noiseReferences
D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, "X-Vectors: Robust DNN Embeddings for Speaker Recognition," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, Apr. 2018, pp. 5329–5333.
B. Desplanques, J. Thienpondt, and K. Demuynck, "ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification," in Interspeech 2020, Shanghai, China, Oct. 2020, pp. 3830–3834.
A. Nagrani, J. S. Chung, and A. Zisserman, "VoxCeleb: A Large-Scale Speaker Identification Dataset," in Interspeech 2017, Stockholm, Sweden, Aug. 2017, pp. 2616–2620.
J. S. Chung, A. Nagrani, and A. Zisserman, "VoxCeleb2: Deep Speaker Recognition," in Interspeech 2018, Hyderabad, India, Sept. 2018, pp. 1086–1090.
A. Nagrani, J. S. Chung, W. Xie, and A. Zisserman, "Voxceleb: Large-scale speaker verification in the wild," Computer Speech & Language, vol. 60, Mar. 2020, Art. no. 101027.
N. R. Koluguri, T. Park, and B. Ginsburg, "TitaNet: Neural Model for Speaker Representation with 1D Depth-Wise Separable Convolutions and Global Context," in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, May 2022, pp. 8102–8106.
H. Wang, S. Zheng, Y. Chen, L. Cheng, and Q. Chen, "CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking," in Interspeech 2023, Dublin, Ireland, Aug. 2023, pp. 5301–5305.
W. Zhu et al., "SpeechNAS: Towards Better Trade-Off Between Latency and Accuracy for Large-Scale Speaker Verification," in 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Cartagena, Colombia, Dec. 2021, pp. 1102–1109.
J.-H. Kim, J. Heo, H. Shim, and H.-J. Yu, "Extended U-Net for Speaker Verification in Noisy Environments," in Interspeech 2022, Incheon, South Korea, Sept. 2022, pp. 590–594.
M. Ravanelli and Y. Bengio, "Speaker Recognition from Raw Waveform with SincNet," in 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, Dec. 2018, pp. 1021–1028.
W. Helali, Ζ. Hajaiej, and A. Cherif, "Real Time Speech Recognition based on PWP Thresholding and MFCC using SVM," Engineering, Technology & Applied Science Research, vol. 10, no. 5, pp. 6204–6208, Oct. 2020.
S. Nawaz et al., "Cross-modal Speaker Verification and Recognition: A Multilingual Perspective," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, June 2021, pp. 1682–1691.
K. Nam, Y. Kim, J. Huh, H.-S. Heo, J. Jung, and J. S. Chung, "Disentangled Representation Learning for Multilingual Speaker Recognition," in Interspeech 2023, Dublin, Ireland, Aug. 2023, pp. 5316–5320.
H. Zhang, L. Wang, K. A. Lee, M. Liu, J. Dang, and H. Chen, "Meta-Learning for Cross-Channel Speaker Verification," in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, June 2021, pp. 5839–5843.
S. G. Kruthika, C. N. Trisiladevi, and P. Mahesha, "Voice Comparison Approaches for Forensic Application: A Review," in 2023 Third International Conference on Secure Cyber Computing and Communication (ICSCCC), Jalandhar, India, May 2023, pp. 797–802.
A. Akram, M. Stanojevic, M. Ehghaghi, and J. Novikova, "Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials." arXiv, Apr. 2024.
S. Bianco et al., "ArabCeleb: Speaker Recognition in Arabic," in AIxIA 2021 – Advances in Artificial Intelligence, vol. 13196, S. Bandini, F. Gasparini, V. Mascardi, M. Palmonari, and G. Vizzari, Eds. Cham: Springer International Publishing, 2022, pp. 338–347.
H. Mubarak, A. Hussein, S. A. Chowdhury, and A. Ali, "QASR: QCRI Aljazeera Speech Resource A Large Scale Annotated Arabic Speech Corpus," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 2274–2285.
D. Snyder, G. Chen, and D. Povey, "MUSAN: A Music, Speech, and Noise Corpus." arXiv, Oct. 2015.
M. Ravanelli et al., "Open-Source Conversational AI with SpeechBrain 1.0," Journal of Machine Learning Research, vol. 25, no. 333, pp. 1–11, 2024.
Downloads
How to Cite
License
Copyright (c) 2026 Mahmoud Ayman, Fahad A. Aloufi

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
