A Robust Feature Extraction Method for Real-Time Speech Recognition System on a Raspberry Pi 3 Board

Authors

  • A. Mnassri Department of Physics, Faculty of Sciences, Tunis - El Manar University, Tunisia
  • M. Bennasr Department of Physics, Faculty of Sciences, Tunis - El Manar University, Tunisia
  • C. Adnane Department of Physics, Faculty of Sciences, Tunis - El Manar University, Tunisia
Volume: 9 | Issue: 2 | Pages: 4066-4070 | April 2019 | https://doi.org/10.48084/etasr.2533

Abstract

The development of a real-time automatic speech recognition system (ASR) better adapted to environmental variabilities, such as noisy surroundings, speaker variations and accents has become a high priority. Robustness is required, and it can be performed at the feature extraction stage which avoids the need for other pre-processing steps. In this paper, a new robust feature extraction method for real-time ASR system is presented. A combination of Mel-frequency cepstral coefficients (MFCC) and discrete wavelet transform (DWT) is proposed. This hybrid system can conserve more extracted speech features which tend to be invariant to noise. The main idea is to extract MFCC features by denoising the obtained coefficients in the wavelet domain by using a median filter (MF). The proposed system has been implemented on Raspberry Pi 3 which is a suitable platform for real-time requirements. The experiments showed a high recognition rate (100%) in clean environment and satisfying results (ranging from 80% to 100%) in noisy environments at different signal to noise ratios (SNRs).

Keywords:

automatic speech recognition, discrete wavelet transform, Mel frequency cestrum coefficients, median filter, support vector machines, Raspberry Pi

Downloads

Download data is not yet available.

References

M. Kos, M. Rojc, A. Zgank, Z. Kacic, D. Vlaj, “A speech-based distributed architecture platform for an intelligent ambience”, Computers & Electrical Engineering, Vol. 71, pp. 818-832, 2018 DOI: https://doi.org/10.1016/j.compeleceng.2017.07.010

K. Bader, B. Lussier, W. Schon, “A fault tolerant architecture for data fusion: A real application of Kalman filters for mobile robot localization”, Robotics and Autonomous Systems, Vol. 88, pp. 11-23, 2017 DOI: https://doi.org/10.1016/j.robot.2016.11.015

B. Jensen, N. Tomatis, L. Mayor, A. Drygajlo, R. Siegwart, “Robots meet humans-Interaction in public spaces”, IEEE Transactions on Industrial Electronics, Vol. 52, No. 6, pp. 1530–1546, 2005 DOI: https://doi.org/10.1109/TIE.2005.858730

H. K. Lam, F. H. Leung, “Design and training for combinational neurallogic systems”, IEEE Transactions on Industrial Electronics, Vol. 54, No. 1, pp. 612-619, 2007 DOI: https://doi.org/10.1109/TIE.2006.885446

N. Hataoka, Y. Obuchi, T. Mitamura, E. Nyberg, “Robust speech dialog interface for car telematics service”, First IEEE Consumer Communications and Networking Conference, Las Vegas, USA, January 5-8, 2004

K. Saeed, M. Nammous, “A speech-and-speaker identification system: Feature extraction, description, and classification of speech-signal image”, IEEE Transactions on Industrial Electronics, Vol. 54, No. 2, pp. 887–897, 2007 DOI: https://doi.org/10.1109/TIE.2007.891647

D. Yongda, L. Fang, X. Huang, “Research on multimodal human-robot interaction based on speech and gesture”, Computers & Electrical Engineering, Vol. 72, pp. 443-454, 2018 DOI: https://doi.org/10.1016/j.compeleceng.2018.09.014

J. Manikandan, B. Venkataramani, K. Girish, H. Karthic, V. Siddharth, “Hardware implementation of real-time speech recognition system using TMS320C6713 DSP”, 24th Internatioal Conference on VLSI Design, Chennai, India, January 2-7, 2011 DOI: https://doi.org/10.1109/VLSID.2011.12

C. C. Shen, W. Plishker, S. S. Bhattacharyya, “Design and optimization of a distributed, embedded speech recognition system”, IEEE International Symposium on Parallel and Distributed Processing, Miami, USA, April 14-18, 2008 DOI: https://doi.org/10.1109/IPDPS.2008.4536572

B. Kamdar, B. Mirchandani, D. Shah, Y. S. Rao, “Real time speech recognition using IIR digital filters implemented on an embedded system”, International Conference on Communication, Information & Computing Technology, Mumbai, India, October 19-20, 2012 DOI: https://doi.org/10.1109/ICCICT.2012.6398179

M. M. Da Silva, D. A. Evin, S. Verrastro, “Speaker-independent embedded speech recognition using Hidden Markov Models”, IEEE Conference on Computer Sciences, Buenos Aires, Argentina, November 30-December 2, 2016

J. Li, D. An, L. Lang, D. Yang, “Embedded speaker recognition system design and implementation based on FPGA”, Procedia Engineering, Vol. 29, pp. 2633-2637, 2012 DOI: https://doi.org/10.1016/j.proeng.2012.01.363

G. Tamulevicius, V. Arminas, E. Ivanovas, D. Navakauskas, “Hardware Accelerated FPGA Implementation of Lithuanian Isolated Word Recognition System”, Elektronika Ir Elektrotechnika, Vol. 99, No. 3, pp. 57-62, 2010

M. A. A. Zulkifly,N. Yahya, “Relative spectral-perceptual linear prediction (RASTA-PLP) speech signals analysis using singular value decomposition (SVD)”, IEEE 3rd International Symposium in Robotics and Manufacturing Automation, Kuala Lumpur, Malaysia, September 19-21, 2017 DOI: https://doi.org/10.1109/ROMA.2017.8231833

H. Gupta, D. Gupta, “LPC and LPCC method of feature extraction in Speech Recognition System”, 6th International Conference - Cloud System and Big Data Engineering (Confluence), Noida, India, January 14-15, 2016 DOI: https://doi.org/10.1109/CONFLUENCE.2016.7508171

D. O’Shaugnessy, “Automatic speech recognition: History, methods and challenges”, Pattern Recognition, Vol. 41, No. 10, pp. 2965-2979, 2008 DOI: https://doi.org/10.1016/j.patcog.2008.05.008

C. Kim, R. M. Stern, “Power-normalized cepstral coefficients (PNCC) for robust speech recognition”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 24, No. 7, pp. 1315-1329, 2016 DOI: https://doi.org/10.1109/TASLP.2016.2545928

S. Furui, “Cepstral analysis technique for automatic speaker verification”, IEEE Transactions On Acoustics, Speech, and Signal Processing, Vol. 29, No. 2, pp. 254-272, 1981 DOI: https://doi.org/10.1109/TASSP.1981.1163530

O. Viikki, K. Laurila, “Cepstral domain segmental feature vector normalization for noise robust speech recognition”, Speech Communication, Vol. 25, No. 1-3, pp. 133-147, 1998 DOI: https://doi.org/10.1016/S0167-6393(98)00033-8

S. Kim, M. Ji, H. Kim, “Noise-robust speaker recognition using subband likelihoods and reliable-feature selection”, ETRI Journal, Vol. 30, pp. 89-100, 2008 DOI: https://doi.org/10.4218/etrij.08.0107.0108

S. Okawa, E. Bocchieri, A. Potamianos, “Multi-band speech recognition in noisy environments”, IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, USA, May 15, 1998

W. C. Chen, C. T. Hsieh, E. Lai, “Multiband approach to robust text independent speaker identification”, Computational Linguistics and Chinese Language Processing, Vol. 9, No. 2, pp. 63-76, 2004

M. I. Abdalla, H. M. Abobakr, T. S. Gaafar, “DWT and MFCCs based Feature Extraction Methods for Isolated Word Recognition”, International Journal of Computer Applications, Vol. 69, No. 20, pp. 21-26, 2013 DOI: https://doi.org/10.5120/12087-8165

R. X. Gao, R. Yan, Wavelets Theory and Applications for Manufacturing, Springer, 2010

L. R. Rabiner, B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993

C. Hsu and C. Lin, “A comparison of methods for multiclass support vector machines”, IEEE Transactions on Neural Networks, Vol. 13, No. 2, pp. 415-425, 2001 DOI: https://doi.org/10.1109/72.991427

N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and other Kernel-based Learning Methods, Cambridge University Press, 2000 DOI: https://doi.org/10.1017/CBO9780511801389

A. Mnassri, M. Bennasr, A. Cherif “GA Algorithm Optimizing SVM Multi-Class Kernel Parameters Applied in Arabic Speech Recognition”, Indian Journal of Science and Technology, Vol. 10, No 27, pp. 1-9, 2017 DOI: https://doi.org/10.17485/ijst/2017/v10i27/114943

Downloads

How to Cite

[1]
A. Mnassri, M. Bennasr, and C. Adnane, “A Robust Feature Extraction Method for Real-Time Speech Recognition System on a Raspberry Pi 3 Board”, Eng. Technol. Appl. Sci. Res., vol. 9, no. 2, pp. 4066–4070, Apr. 2019.

Metrics

Abstract Views: 761
PDF Downloads: 457

Metrics Information