Deep Learning, Ensemble and Supervised Machine Learning for Arabic Speech Emotion Recognition
Received: 24 February 2024 | Revised: 10 March 2024 | Accepted: 12 March 2024 | Online: 2 April 2024
Corresponding author: Wahiba Ismaiel
Abstract
Today, automatic emotion recognition in speech is one of the most important areas of research in signal processing. Identifying emotional content in Arabic speech is regarded as a very challenging and intricate task due to several obstacles, such as the wide range of cultures and dialects, the influence of cultural factors on emotional expression, and the scarcity of available datasets. This study used a variety of artificial intelligence models, including Xgboost, Adaboost, KNN, DT, and SOM, and a deep-learning model named SERDNN. ANAD was employed as a training dataset, which contains three emotions, "angry", "happy", and "surprised", with 844 features. This study aimed to present a more efficient and accurate technique for recognizing emotions in Arabic speech. Precision, accuracy, recall, and F1-score metrics were utilized to evaluate the effectiveness of the proposed techniques. The results showed that the Xgboost, SOM, and KNN classifiers achieved superior performance in recognizing emotions in Arabic speech. The SERDNN deep learning model outperformed the other techniques, achieving the highest accuracy of 97.40% with a loss rate of 0.1457. Therefore, it can be relied upon and deployed to recognize emotions in Arabic speech.
Keywords:
ANAD, SERDNN, SOM, Xgboost, Adaboost, DT, KNN, Arabic speech emotion recognitionDownloads
References
L. Chen, W. Su, Y. Feng, M. Wu, J. She, and K. Hirota, "Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction," Information Sciences, vol. 509, pp. 150–163, Jan. 2020.
A. S. Alluhaidan, O. Saidani, R. Jahangir, M. A. Nauman, and O. S. Neffati, "Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network," Applied Sciences, vol. 13, no. 8, Jan. 2023, Art. no. 4750.
A. H. Meftah, M. Qamhan, Uks.-A. 22nd I. C. on C. M. and S. Alotaibi, and Y. A. Zakariah, "Arabic Speech Emotion Recognition Using KNN and KSUEmotions Corpus," presented at the UKSim-AMSS 22nd International Conference on Computer Modelling and Simulation, Mar. 2020.
R. H. Aljuhani, A. Alshutayri, and S. Alahdal, "Arabic Speech Emotion Recognition From Saudi Dialect Corpus," IEEE Access, vol. 9, pp. 127081–127085, 2021.
K. Mountzouris, I. Perikos, and I. Hatzilygeroudis, "Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism," Electronics, vol. 12, no. 20, Jan. 2023, Art. no. 4376.
S. Akinpelu and S. Viriri, "Speech emotion classification using attention based network and regularized feature selection," Scientific Reports, vol. 13, no. 1, Jul. 2023, Art. no. 11990.
Muljono, M. R. Prasetya, A. Harjoko, and C. Supriyanto, "Speech Emotion Recognition of Indonesian Movie Audio Tracks based on MFCC and SVM," in 2019 International Conference on contemporary Computing and Informatics (IC3I), Singapore, Dec. 2019, pp. 22–25.
S. Hamsa, I. Shahin, Y. Iraqi, and N. Werghi, "Emotion Recognition From Speech Using Wavelet Packet Transform Cochlear Filter Bank and Random Forest Classifier," IEEE Access, vol. 8, pp. 96994–97006, 2020.
S. Xefteris, N. Doulamis, V. Andronikou, T. Varvarigou, and G. Cambourakis, "Behavioral Biometrics in Assisted Living: A Methodology for Emotion Recognition," Engineering, Technology & Applied Science Research, vol. 6, no. 4, pp. 1035–1044, Aug. 2016.
S. C. Venkateswarlu, S. R. Jeevakala, N. U. Kumar, P. Munaswamy, and D. Pendyala, "Emotion Recognition From Speech and Text using Long Short-Term Memory," Engineering, Technology & Applied Science Research, vol. 13, no. 4, pp. 11166–11169, Aug. 2023.
W. Almukadi, "Smart Scarf: An IOT-based Solution for Emotion Recognition," Engineering, Technology & Applied Science Research, vol. 13, no. 3, pp. 10870–10874, Jun. 2023.
A. Meftah, Y. Alotaibi, and S.-A. Selouani, "Emotional speech recognition: A multilingual perspective," in 2016 International Conference on Bio-engineering for Smart Technologies (BioSMART), Dubai, United Arab Emirates, Sep. 2016.
S. Klaylat, Z. Osman, L. Hamandi, and R. Zantout, "Emotion recognition in Arabic speech," Analog Integrated Circuits and Signal Processing, vol. 96, no. 2, pp. 337–351, Aug. 2018.
R. Zantout, S. Klaylat, L. Hamandi, and Z. Osman, "Ensemble Models for Enhancement of an Arabic Speech Emotion Recognition System," in Advances in Information and Communication, 2020, pp. 174–187.
L. Abdel-Hamid, "Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features," Speech Communication, vol. 122, pp. 19–30, Sep. 2020.
A. Ali and Y. Hifny, "Efficient Arabic emotion recognition using deep neural networks." arXiv, Oct. 31, 2020.
O. Mohamed and S. A. Aly, "Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset." arXiv, Oct. 08, 2021.
O. A. Mohammad and M. Elhadef, "Arabic Speech Emotion Recognition Method Based On LPC And PPSD," in 2021 2nd International Conference on Computation, Automation and Knowledge Management (ICCAKM), Jan. 2021, pp. 31–36.
S. Kakuba, A. Poulose, and D. S. Han, "Attention-Based Multi-Learning Approach for Speech Emotion Recognition With Dilated Convolution," IEEE Access, vol. 10, pp. 122302–122313, 2022.
A. Agrima, I. Mounir, A. Farchi, L. Elmaazouzi, and B. Mounir, "Emotion recognition from syllabic units using k-nearest-neighbor classification and energy distribution," International Journal of Electrical and Computer Engineering (IJECE), vol. 11, no. 6, pp. 5438–5449, Dec. 2021.
I. Alwayle et al., "Parameter Tuned Machine Learning Based Emotion Recognition on Arabic Twitter Data," Computer Systems Science and Engineering, vol. 46, no. 3, pp. 3423–3438, 2023.
M. Tajalsir, S. M. Hernandez, and F. A. Mohammed, "ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model," Signal & Image Processing : An International Journal, vol. 13, no. 1, pp. 45–53, Feb. 2022.
W. Alsabhan, "Human–Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention," Sensors, vol. 23, no. 3, Jan. 2023, Art. no. 1386.
I. Shahin, O. A. Alomari, A. B. Nassif, I. Afyouni, I. A. Hashem, and A. Elnagar, "An efficient feature selection method for arabic and english speech emotion recognition using Grey Wolf Optimizer," Applied Acoustics, vol. 205, Mar. 2023, Art. no. 109279.
M. El Seknedy and S. A. Fawzi, "Emotion Recognition System for Arabic Speech: Case Study Egyptian Accent," in Model and Data Engineering, Cairo, Egypt, 2023, pp. 102–115.
R. Y. Cherif, A. Moussaoui, N. Frahta, and M. Berrimi, "Effective speech emotion recognition using deep learning approaches for Algerian dialect," in 2021 International Conference of Women in Data Science at Taif University (WiDSTaif ), Taif, Saudi Arabia, Mar. 2021.
W. G. S. Al Fadahli, R. K. S. Al Hinai, P. C. Sherimon, V. Sherimon, and R. K. Remya, "An Automated Emotion Recognition from Arabic Speech Using Machine Learning Technique," International Journal of Creative Research Thoughts, vol. 10, no. 10, pp. a435–a438, Oct. 2022.
S. Klaylat, Z. Osman, L. Hamandi, and R. Zantout, "Enhancement of an Arabic Speech Emotion Recognition System," International Journal of Applied Engineering Research, vol. 13, no. 5, pp. 2380–2389, 2018.
"Arabic Natural Audio Dataset." [Online]. Available: https://www.kaggle.com/datasets/suso172/arabic-natural-audio-dataset.
E. de Bodt, M. Cottrell, P. Letremy, and M. Verleysen, "On the use of self-organizing maps to accelerate vector quantization," Neurocomputing, vol. 56, pp. 187–203, Jan. 2004.
T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, May 2016, pp. 785–794.
Y. Freund and R. E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, Aug. 1997.
Downloads
How to Cite
License
Copyright (c) 2024 Wahiba Ismaiel, Abdalilah Alhalangy, Adil O. Y. Mohamed, Abdalla Ibrahim Abdalla Musa
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.