Deep Learning, Ensemble and Supervised Machine Learning for Arabic Speech Emotion Recognition

Authors

  • Wahiba Ismaiel Department of Science and Technology, University College of Ranyah, Taif University, Saudi Arabia
  • Abdalilah Alhalangy Department of Computer Engineering, College of Computer, Qassim University, Saudi Arabia
  • Adil O. Y. Mohamed Department of Computer Science, College of Computer, Qassim University, Saudi Arabia https://orcid.org/0000-0003-3918-0128
  • Abdalla Ibrahim Abdalla Musa Department of Computer Science, College of Computer, Qassim University, Saudi Arabia https://orcid.org/0000-0003-2014-8363
Volume: 14 | Issue: 2 | Pages: 13757-13764 | April 2024 | https://doi.org/10.48084/etasr.7134

Abstract

Today, automatic emotion recognition in speech is one of the most important areas of research in signal processing. Identifying emotional content in Arabic speech is regarded as a very challenging and intricate task due to several obstacles, such as the wide range of cultures and dialects, the influence of cultural factors on emotional expression, and the scarcity of available datasets. This study used a variety of artificial intelligence models, including Xgboost, Adaboost, KNN, DT, and SOM, and a deep-learning model named SERDNN. ANAD was employed as a training dataset, which contains three emotions, "angry", "happy", and "surprised", with 844 features. This study aimed to present a more efficient and accurate technique for recognizing emotions in Arabic speech. Precision, accuracy, recall, and F1-score metrics were utilized to evaluate the effectiveness of the proposed techniques. The results showed that the Xgboost, SOM, and KNN classifiers achieved superior performance in recognizing emotions in Arabic speech. The SERDNN deep learning model outperformed the other techniques, achieving the highest accuracy of 97.40% with a loss rate of 0.1457. Therefore, it can be relied upon and deployed to recognize emotions in Arabic speech.

Keywords:

ANAD, SERDNN, SOM, Xgboost, Adaboost, DT, KNN, Arabic speech emotion recognition

Downloads

Download data is not yet available.

References

L. Chen, W. Su, Y. Feng, M. Wu, J. She, and K. Hirota, "Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction," Information Sciences, vol. 509, pp. 150–163, Jan. 2020.

A. S. Alluhaidan, O. Saidani, R. Jahangir, M. A. Nauman, and O. S. Neffati, "Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network," Applied Sciences, vol. 13, no. 8, Jan. 2023, Art. no. 4750.

A. H. Meftah, M. Qamhan, Uks.-A. 22nd I. C. on C. M. and S. Alotaibi, and Y. A. Zakariah, "Arabic Speech Emotion Recognition Using KNN and KSUEmotions Corpus," presented at the UKSim-AMSS 22nd International Conference on Computer Modelling and Simulation, Mar. 2020.

R. H. Aljuhani, A. Alshutayri, and S. Alahdal, "Arabic Speech Emotion Recognition From Saudi Dialect Corpus," IEEE Access, vol. 9, pp. 127081–127085, 2021.

K. Mountzouris, I. Perikos, and I. Hatzilygeroudis, "Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism," Electronics, vol. 12, no. 20, Jan. 2023, Art. no. 4376.

S. Akinpelu and S. Viriri, "Speech emotion classification using attention based network and regularized feature selection," Scientific Reports, vol. 13, no. 1, Jul. 2023, Art. no. 11990.

Muljono, M. R. Prasetya, A. Harjoko, and C. Supriyanto, "Speech Emotion Recognition of Indonesian Movie Audio Tracks based on MFCC and SVM," in 2019 International Conference on contemporary Computing and Informatics (IC3I), Singapore, Dec. 2019, pp. 22–25.

S. Hamsa, I. Shahin, Y. Iraqi, and N. Werghi, "Emotion Recognition From Speech Using Wavelet Packet Transform Cochlear Filter Bank and Random Forest Classifier," IEEE Access, vol. 8, pp. 96994–97006, 2020.

S. Xefteris, N. Doulamis, V. Andronikou, T. Varvarigou, and G. Cambourakis, "Behavioral Biometrics in Assisted Living: A Methodology for Emotion Recognition," Engineering, Technology & Applied Science Research, vol. 6, no. 4, pp. 1035–1044, Aug. 2016.

S. C. Venkateswarlu, S. R. Jeevakala, N. U. Kumar, P. Munaswamy, and D. Pendyala, "Emotion Recognition From Speech and Text using Long Short-Term Memory," Engineering, Technology & Applied Science Research, vol. 13, no. 4, pp. 11166–11169, Aug. 2023.

W. Almukadi, "Smart Scarf: An IOT-based Solution for Emotion Recognition," Engineering, Technology & Applied Science Research, vol. 13, no. 3, pp. 10870–10874, Jun. 2023.

A. Meftah, Y. Alotaibi, and S.-A. Selouani, "Emotional speech recognition: A multilingual perspective," in 2016 International Conference on Bio-engineering for Smart Technologies (BioSMART), Dubai, United Arab Emirates, Sep. 2016.

S. Klaylat, Z. Osman, L. Hamandi, and R. Zantout, "Emotion recognition in Arabic speech," Analog Integrated Circuits and Signal Processing, vol. 96, no. 2, pp. 337–351, Aug. 2018.

R. Zantout, S. Klaylat, L. Hamandi, and Z. Osman, "Ensemble Models for Enhancement of an Arabic Speech Emotion Recognition System," in Advances in Information and Communication, 2020, pp. 174–187.

L. Abdel-Hamid, "Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features," Speech Communication, vol. 122, pp. 19–30, Sep. 2020.

A. Ali and Y. Hifny, "Efficient Arabic emotion recognition using deep neural networks." arXiv, Oct. 31, 2020.

O. Mohamed and S. A. Aly, "Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset." arXiv, Oct. 08, 2021.

O. A. Mohammad and M. Elhadef, "Arabic Speech Emotion Recognition Method Based On LPC And PPSD," in 2021 2nd International Conference on Computation, Automation and Knowledge Management (ICCAKM), Jan. 2021, pp. 31–36.

S. Kakuba, A. Poulose, and D. S. Han, "Attention-Based Multi-Learning Approach for Speech Emotion Recognition With Dilated Convolution," IEEE Access, vol. 10, pp. 122302–122313, 2022.

A. Agrima, I. Mounir, A. Farchi, L. Elmaazouzi, and B. Mounir, "Emotion recognition from syllabic units using k-nearest-neighbor classification and energy distribution," International Journal of Electrical and Computer Engineering (IJECE), vol. 11, no. 6, pp. 5438–5449, Dec. 2021.

I. Alwayle et al., "Parameter Tuned Machine Learning Based Emotion Recognition on Arabic Twitter Data," Computer Systems Science and Engineering, vol. 46, no. 3, pp. 3423–3438, 2023.

M. Tajalsir, S. M. Hernandez, and F. A. Mohammed, "ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model," Signal & Image Processing : An International Journal, vol. 13, no. 1, pp. 45–53, Feb. 2022.

W. Alsabhan, "Human–Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention," Sensors, vol. 23, no. 3, Jan. 2023, Art. no. 1386.

I. Shahin, O. A. Alomari, A. B. Nassif, I. Afyouni, I. A. Hashem, and A. Elnagar, "An efficient feature selection method for arabic and english speech emotion recognition using Grey Wolf Optimizer," Applied Acoustics, vol. 205, Mar. 2023, Art. no. 109279.

M. El Seknedy and S. A. Fawzi, "Emotion Recognition System for Arabic Speech: Case Study Egyptian Accent," in Model and Data Engineering, Cairo, Egypt, 2023, pp. 102–115.

R. Y. Cherif, A. Moussaoui, N. Frahta, and M. Berrimi, "Effective speech emotion recognition using deep learning approaches for Algerian dialect," in 2021 International Conference of Women in Data Science at Taif University (WiDSTaif ), Taif, Saudi Arabia, Mar. 2021.

W. G. S. Al Fadahli, R. K. S. Al Hinai, P. C. Sherimon, V. Sherimon, and R. K. Remya, "An Automated Emotion Recognition from Arabic Speech Using Machine Learning Technique," International Journal of Creative Research Thoughts, vol. 10, no. 10, pp. a435–a438, Oct. 2022.

S. Klaylat, Z. Osman, L. Hamandi, and R. Zantout, "Enhancement of an Arabic Speech Emotion Recognition System," International Journal of Applied Engineering Research, vol. 13, no. 5, pp. 2380–2389, 2018.

"Arabic Natural Audio Dataset." [Online]. Available: https://www.kaggle.com/datasets/suso172/arabic-natural-audio-dataset.

E. de Bodt, M. Cottrell, P. Letremy, and M. Verleysen, "On the use of self-organizing maps to accelerate vector quantization," Neurocomputing, vol. 56, pp. 187–203, Jan. 2004.

T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, May 2016, pp. 785–794.

Y. Freund and R. E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, Aug. 1997.

Downloads

How to Cite

[1]
W. Ismaiel, A. Alhalangy, A. O. Y. Mohamed, and A. I. A. Musa, “Deep Learning, Ensemble and Supervised Machine Learning for Arabic Speech Emotion Recognition”, Eng. Technol. Appl. Sci. Res., vol. 14, no. 2, pp. 13757–13764, Apr. 2024.

Metrics

Abstract Views: 189
PDF Downloads: 331

Metrics Information