Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System

  • A. A. Alasadi Department of Computer Science and IT, Dr. Babasaheb Ambedkar Marathwada University, India
  • T. H. Aldhayni Community College in Abqaiq, King Faisal University, Saudi Arabia
  • R. R. Deshmukh Department of Computer Science and IT, Dr. Babasaheb Ambedkar Marathwada University, India
  • A. H. Alahmadi Department of Computer Science, Taibah University, Saudi Arabia
  • A. S. Alshebami Community College in Abqaiq, King Faisal University, Saudi Arabia
Keywords: speech recognition, feature extraction, PNCC, ModGDF, MFCC, Arabic speech recognition

Abstract

This paper studies three feature extraction methods, Mel-Frequency Cepstral Coefficients (MFCC), Power-Normalized Cepstral Coefficients (PNCC), and Modified Group Delay Function (ModGDF) for the development of an Automated Speech Recognition System (ASR) in Arabic. The Support Vector Machine (SVM) algorithm processed the obtained features. These feature extraction algorithms extract speech or voice characteristics and process the group delay functionality calculated straight from the voice signal. These algorithms were deployed to extract audio forms from Arabic speakers. PNCC provided the best recognition results in Arabic speech in comparison with the other methods. Simulation results showed that PNCC and ModGDF were more accurate than MFCC in Arabic speech recognition.

Downloads

Download data is not yet available.

References

P. P. Shrishrimal, R. R. Deshmukh, V. B. Waghmare, “Indian language speech database: A review”, International Journal of Computer Applications, Vol. 47, No. 5, pp. 17-21, 2012

S. K. Gaikwad, B. W. Gawali, P. Yannawar, “A review on speech recognition technique”, International Journal of Computer Applications, Vol. 10, No. 3, pp. 16-24, 2010

C. Huang, T. Chen, E. Chang, “Accent issues in large vocabulary continuous speech recognition”, International Journal of Speech Technology, Vol. 7, No. 2-3, pp. 141-153, 2004

M. A. Anasuya, S. K. Katti, “Speech recognition by machine: A review”, International Journal of Computer Science and Information Security, Vol. 6, No. 3, pp. 181-205, 2009

P. L. Garvin, P. Ladefoged, “Speaker identification and message identification in speech recognition”, Phonetica, Vol. 9, No. 4, pp. 193-199, 1963

G. Ceidaite, L. Telksnys, “Analysis of factors influencing accuracy of speech recognition”, Elektronika ir Elektrotechnika, Vol. 105, No. 9, pp. 69-72, 2010

Z. H. Tan, B. Lindberg, “Speech recognition on mobile devices”, in: Mobile Multimedia Processing – WMMP 2008, Lecture Notes in Computer Science, Vol. 5960, Springer, 2010

W. Li, K. Takeda, F. Itakura, “Robust in-car speech recognition based on nonlinear multiple regressions”, EURASIP Journal on Advances in Signal Processing, 2007

W. Ou, W. Gao, Z. Li, S. Zhang, Q. Wang, “Application of keywords speech recognition in agricultural voice system”, Second International Conference on Computational Intelligence and Natural Computing, Wuhan, China, September 13-14, 2010

L. Zhu, L. Chen, D. Zhao, J. Zhou, W. Zhang, “Emotion recognition from Chinese speech for smart affective services using a combination of SVM and DBN”, Sensors, Vol. 17, No. 7, 2017

J. E. Noriega-Linares, J. M. Navarro Ruiz, “On the application of the raspberry Pi as an advanced acoustic sensor network for noise monitoring”, Electronics, Vol. 5, No. 4, 2016

M. Al-Rousan, K. Assaleh, “A wavelet-and neural network-based voice system for a smart wheelchair control”, Journal of the Franklin Institute, Vol. 348, No. 1, pp. 90-100, 2011

I. V. McLoughlin, H. R. Sharifzadeh, “Speech recognition for smart homes”, in: Speech Recognition, Technologies and Applications, Intech, 2008

A. Glowacz, “Diagnostics of rotor damages of three-phase induction motors using acoustic signals and SMOFS-20-EXPANDED”, Archives of Acoustics, Vol. 41, No. 3, pp. 507-515, 2016

A. Glowacz, “Fault diagnosis of single-phase induction motor based on acoustic signals”, Mechanical Systems and Signal Processing, Vol. 117, pp. 65-80, 2019

M. Kunicki, A. Cichon, “Application of a phase resolved partial discharge pattern analysis for acoustic emission method in high voltage insulation systems diagnostics”, Archives of Acoustics, Vol. 43, No. 2, pp. 235-243, 2018

D. Mika, J. Jozwik, “Advanced time-frequency representation in voice signal analysis”, Advances in Science and Technology Research Journal, Vol. 12, No. 1, pp. 251-259, 2018

L. Zou, Y. Guo, H. Liu, L. Zhang, T. Zhao, “A method of abnormal states detection based on adaptive extraction of transformer vibro-acoustic signals”, Energies, Vol. 10, No. 12, 2017

H. Yang, G. Wen, Q. Hu, Y. Li, L. Dai, “Experimental investigation on influence factors of acoustic emission activity in coal failure process”, Energies, Vol. 11, No. 6, Article ID 1414, 2018

L. Mokhtarpour, H. Hassanpour, “A self-tuning hybrid active noise control system”, Journal of the Franklin Institute, Vol. 349, No. 5, pp. 1904-1914, 2012

S. C. Lee, J. F. Wang, M. H. Chen, “Threshold-based noise detection and reduction for automatic speech recognition system in human-robot interactions”, Sensors, Vol. 18, No. 7, Article ID 2068, 2018

S. M. Kuo, W. M. Peng, “Principle and applications of asymmetric crosstalk-resistant adaptive noise canceler”, Journal of the Franklin Institute, Vol. 337, No. 1, pp. 57-71, 2000

J. W. Hung, J. S. Lin, P. J. Wu, “Employing robust principal component analysis for noise-robust speech feature extraction in automatic speech recognition with the structure of a deep neural network”, Applied System Innovation, Vol. 1, No. 3, Article ID 28, 2018

R. P. Lippmann, “Speech recognition by machines and humans”, Speech Communication, Vol. 22, No. 1, pp. 1-15, 1997

J. B. Allen, “How do humans process and recognize speech?”, IEEE Transactions on Speech and Audio Processing, Vol. 2, No. 4, pp. 567-577, 1994

S. Haque, R. Togneri, A. Zaknich, “Perceptual features for automatic speech recognition in noisy environments”, Speech Communication, Vol. 51, No. 1, pp. 58-75, 2009

H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech”, The Journal of the Acoustical Society of America, Vol. 87, No. 4, pp. 1738-1752, 1990

M. Holmberg, D. Gelbart, W. Hemmert, “Automatic speech recognition with an adaptation model motivated by auditory processing”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14, No. 1, pp. 43-49, 2005

C. Kim, R. M. Stern, “Power-normalized Cepstral Coefficients (PNCC) for robust speech recognition”, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, March 25-30, 2012

M. L. Seltzer, D. Yu, Y. Wang, “An investigation of deep neural networks for noise robust speech recognition”, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, May 26-31, 2013

A. L. Maas, Q. V. Le, T. M. O'Neil, O. Vinyals, P. Nguyen, A. Y. Ng, “Recurrent neural networks for noise reduction in robust ASR”, 13th Annual Conference of the International Speech Communication Association, Portland, USA, September 9-13, 2012

M. Wollmer, B. Schuller, F. Eyben, G. Rigoll, “Combining long short-term memory and dynamic bayesian networks for incremental emotion-sensitive artificial listening”, IEEE Journal of Selected Topics in Signal Processing, Vol. 4, No. 5, pp. 867-881, 2010

Z. Zhang, J. Geiger, J. Pohjalainen, A. E. D. Mousa, W. Jin, B. Schuller, “Deep learning for environmentally robust speech recognition: An overview of recent developments”, ACM Transactions on Intelligent Systems and Technology, Vol. 9, No. 5, pp. 1-28, 2018

E. Principi, S. Squartini, F. Piazza, “Power normalized cepstral coefficients based supervectors and i-vectors for small vocabulary speech recognition”, 2014 International Joint Conference on Neural Networks, Beijing, China, July 6-11, 2014

E. Loweimi, S. M. Ahadi, “A New group delay-based feature for robust speech recognition”, 2011 IEEE International Conference on Multimedia and Expo, Barcelona, Spain, July 11-15, 2011

B. Kurian, K. T. Shanavaz, N. G. Kurup, “PNCC based speech enhancement and its performance evaluation using SNR Loss”, 2017 International Conference on Networks & Advances in Computational Technologies, Thiruvanthapuram, India, July 20-22, 2017

T. Fux, D. Jouvet, “Evaluation of PNCC and extended spectral subtraction methods for robust speech recognition”, 23rd European Signal Processing Conference, Nice, France, August 31 – September 4, 2015

A. Kaur, A. Singh, “Power-Normalized Cepstral Coefficients (PNCC) for Punjabi automatic speech recognition using phone based modelling in HTK”, 2nd International Conference on Applied and Theoretical Computing and Communication Technology, Bangalore, India, July 21-23, 2016

C. Kim, R. M. Stern, “Feature extraction for robust speec recognition based on Mmximizing the sharpness of the power distribution and on power flooring”, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, USA, March 14-19, 2010

D. S. Kim, S. Y. Lee, R. M. Kil, “Auditory processing of speech signals for robust speech recognition in real-world noisy environments”, IEEE Transactions on Speech and Audio Processing, Vol. 7, No. 1, pp. 55-69, 1999

Metrics

Abstract Views: 103
PDF Downloads: 58

Metrics Information
Bookmark and Share