Development of a Deep Learning-based Arabic Speech Recognition System for Automatons
Received: 7 August 2024 | Revised: 12 September 2024, 28 September 2024, and 12 October 2024 | Accepted: 16 October 2024 | Online: 2 December 2024
Corresponding author: Waseem Alromema
Abstract
The latest developments in voice recognition have achieved amazing results that are on par with those of human transcribers. However, this significant efficiency may not apply to all languages, nor Arabic. Arabic is the native language of 22 countries and is spoken by approximately 400 million individuals. Verbal difficulties have become a growing problem in recent decades, especially among children, and data samples on Arabic phonetic recognition are limited. For Arabic pronunciation, Artificial Intelligence (AI) techniques show encouraging results. Some devices, such as the Servox Digital Electro-Larynx (EL), can produce voice for such individuals. This study presents a Deep Learning-based Arabic speech recognition system for automatons to recognize captured sounds from the Servox Digital EL. The proposed system employs an autoencoder using a mix of Long-Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) models. The proposed approach has three main stages: de-noising, feature extraction, and Arabic pronunciation. The experimental findings demonstrate that the proposed model was 95.31% accurate for Arabic speech recognition. The evaluation shows that the use of GRU in both the encoding and decoding structures improves efficiency. The proposed model had a Word Error Rate (WER) of 4.69%. The test results demonstrate that the proposed model can be used to create a real-time application to recognize commonly spoken Arabic words.
Keywords:
deep learning, electro-larynx, arabic speech recognition, long-short-term-memory, voice recognitionDownloads
References
I. Guellil, H. Saâdane, F. Azouaou, B. Gueni, and D. Nouvel, "Arabic natural language processing: An overview," Journal of King Saud University - Computer and Information Sciences, vol. 33, no. 5, pp. 497–507, Jun. 2021.
A. Shoufan and S. Al-Ameri, "Natural language processing for dialectical arabic: A survey," in 2nd Workshop on Arabic Natural Language Processing, ANLP 2015 - held at 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015 - Proceedings, Beijing, China, 2015, pp. 36–48.
J. M. Vojtech et al., "Surface Electromyography–Based Recognition, Synthesis, and Perception of Prosodic Subvocal Speech," Journal of Speech, Language, and Hearing Research, vol. 64, no. 6S, pp. 2134–2153, Jun. 2021.
H. R. Khan, M. A. Hasan, M. Kazmi, N. Fayyaz, H. Khalid, and S. A. Qazi, "A Holistic Approach to Urdu Language Word Recognition using Deep Neural Networks," Engineering, Technology & Applied Science Research, vol. 11, no. 3, pp. 7140–7145, Jun. 2021.
Z. Ellaky, F. Benabbou, and S. Ouahabi, "Systematic Literature Review of Social Media Bots Detection Systems," Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 5, May 2023, Art. no. 101551.
B. Dendani, H. Bahi, and T. Sari, "Self-Supervised Speech Enhancement for Arabic Speech Recognition in Real-World Environments," Traitement du Signal, vol. 38, no. 2, pp. 349–358, Apr. 2021.
L. Eljawad et al., "Arabic Voice Recognition Using Fuzzy Logic and Neural Network," International Journal of Applied Engineering Research, vol. 14, no. 3, pp. 651–662, 2019.
S. Hamsa, I. Shahin, Y. Iraqi, and N. Werghi, "Emotion Recognition From Speech Using Wavelet Packet Transform Cochlear Filter Bank and Random Forest Classifier," IEEE Access, vol. 8, pp. 96994–97006, 2020.
I. Shahin and A. B. Nassif, "Emirati-Accented Speaker Identification in Stressful Talking Conditions," in 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA), Ras Al Khaimah, United Arab Emirates, Nov. 2019, pp. 1–6.
B. Dendani, H. Bahi, and T. Sari, "Speech Enhancement Based on Deep AutoEncoder for Remote Arabic Speech Recognition," in Image and Signal Processing, Marrakesh, Morocco, 2020, pp. 221–229.
A. Sherstinsky, "Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network," Physica D: Nonlinear Phenomena, vol. 404, Mar. 2020, Art. no. 132306.
N. Zerari, S. Abdelhamid, H. Bouzgou, and C. Raymond, "Bidirectional deep architecture for Arabic speech recognition," Open Computer Science, vol. 9, no. 1, pp. 92–102, Jan. 2019.
H. A. Alsayadi, A. A. Abdelhamid, I. Hegazy, and Z. T. Fayed, "Arabic speech recognition using end-to-end deep learning," IET Signal Processing, vol. 15, no. 8, pp. 521–534, 2021.
H. A. Alsayadi, A. A. Abdelhamid, I. Hegazy, and Z. T. Fayed, "Non-diacritized Arabic speech recognition based on CNN-LSTM and attention-based models," Journal of Intelligent & Fuzzy Systems, vol. 41, no. 6, pp. 6207–6219, Jan. 2021.
Y. Tai, H. He, W. Zhang, and Y. Jia, "Automatic Generation of Review Content in Specific Domain of Social Network Based on RNN," in 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC), Guangzhou, China, Jul. 2018, pp. 601–608.
Y. C. Lien, E. A. M. Klumperink, B. Tenbroek, J. Strange, and B. Nauta, "Enhanced-Selectivity High-Linearity Low-Noise Mixer-First Receiver With Complex Pole Pair Due to Capacitive Positive Feedback," IEEE Journal of Solid-State Circuits, vol. 53, no. 5, pp. 1348–1360, Feb. 2018.
J. Tang, S. Zhou, and C. Pan, "A Denoising Algorithm for Partial Discharge Measurement Based on the Combination of Wavelet Threshold and Total Variation Theory," IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 6, pp. 3428–3441, Jun. 2020.
F. M. Bayer, A. J. Kozakevicius, and R. J. Cintra, "An iterative wavelet threshold for signal denoising," Signal Processing, vol. 162, pp. 10–20, Sep. 2019.
P. Ravisankar, "Underwater Acoustic Image Denoising Using Stationary Wavelet Transform and Various Shrinkage Functions," ELCVIA. Electronic letters on computer vision and image analysis, vol. 20, no. 2, pp. 38–50, 2021.
H. A. Elharati, M. Alshaari, and V. Z. Këpuska, "Arabic Speech Recognition System Based on MFCC and HMMs," Journal of Computer and Communications, vol. 8, no. 3, pp. 28–34, Mar. 2020.
S. Selvin, R. Vinayakumar, E. A. Gopalakrishnan, V. K. Menon, and K. P. Soman, "Stock price prediction using LSTM, RNN and CNN-sliding window model," in 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India, Sep. 2017, pp. 1643–1647.
W. Zhang et al., "LSTM-Based Analysis of Industrial IoT Equipment," IEEE Access, vol. 6, pp. 23551–23560, 2018.
G. Shen, Q. Tan, H. Zhang, P. Zeng, and J. Xu, "Deep Learning with Gated Recurrent Unit Networks for Financial Sequence Predictions," Procedia Computer Science, vol. 131, pp. 895–903, Jan. 2018.
S. Yang, X. Yu, and Y. Zhou, "LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example," in 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), Shanghai, China, Jun. 2020, pp. 98–101.
C. Wei, S. Kakade, and T. Ma, "The Implicit and Explicit Regularization Effects of Dropout," in Proceedings of the 37th International Conference on Machine Learning, Nov. 2020, pp. 10181–10192.
K. Eckle and J. Schmidt-Hieber, "A comparison of deep networks with ReLU activation function and linear spline-type methods," Neural Networks, vol. 110, pp. 232–242, Feb. 2019.
W. Helali, Ζ. Hajaiej, and A. Cherif, "Real Time Speech Recognition based on PWP Thresholding and MFCC using SVM," Engineering, Technology & Applied Science Research, vol. 10, no. 5, pp. 6204–6208, Oct. 2020.
H. Q. Jaber and H. A. Abdulbaqi, "Real time Arabic speech recognition based on convolution neural network," Journal of Information and Optimization Sciences, vol. 42, no. 7, pp. 1657–1663, Oct. 2021.
Downloads
How to Cite
License
Copyright (c) 2024 Abdulrahman Alahmadi, Ahmed Alahmadi, Eman Alduweib, Waseem Alromema, Bakil Ahmed
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.