A Robust Feature Extraction Method for Real-Time Speech Recognition System on a Raspberry Pi 3 Board

A. Mnassri, M. Bennasr, C. Adnane

Abstract


The development of a real-time automatic speech recognition system (ASR) better adapted to environmental variabilities, such as noisy surroundings, speaker variations and accents has become a high priority. Robustness is required, and it can be performed at the feature extraction stage which avoids the need for other pre-processing steps. In this paper, a new robust feature extraction method for real-time ASR system is presented. A combination of Mel-frequency cepstral coefficients (MFCC) and discrete wavelet transform (DWT) is proposed. This hybrid system can conserve more extracted speech features which tend to be invariant to noise. The main idea is to extract MFCC features by denoising the obtained coefficients in the wavelet domain by using a median filter (MF). The proposed system has been implemented on Raspberry Pi 3 which is a suitable platform for real-time requirements. The experiments showed a high recognition rate (100%) in clean environment and satisfying results (ranging from 80% to 100%) in noisy environments at different signal to noise ratios (SNRs).


Keywords


automatic speech recognition; discrete wavelet transform; Mel frequency cestrum coefficients; median filter; support vector machines; Raspberry Pi

Full Text:

PDF

References


M. Kos, M. Rojc, A. Zgank, Z. Kacic, D. Vlaj, “A speech-based distributed architecture platform for an intelligent ambience”, Computers & Electrical Engineering, Vol. 71, pp. 818-832, 2018

K. Bader, B. Lussier, W. Schon, “A fault tolerant architecture for data fusion: A real application of Kalman filters for mobile robot localization”, Robotics and Autonomous Systems, Vol. 88, pp. 11-23, 2017

B. Jensen, N. Tomatis, L. Mayor, A. Drygajlo, R. Siegwart, “Robots meet humans-Interaction in public spaces”, IEEE Transactions on Industrial Electronics, Vol. 52, No. 6, pp. 1530–1546, 2005

H. K. Lam, F. H. Leung, “Design and training for combinational neurallogic systems”, IEEE Transactions on Industrial Electronics, Vol. 54, No. 1, pp. 612-619, 2007

N. Hataoka, Y. Obuchi, T. Mitamura, E. Nyberg, “Robust speech dialog interface for car telematics service”, First IEEE Consumer Communications and Networking Conference, Las Vegas, USA, January 5-8, 2004

K. Saeed, M. Nammous, “A speech-and-speaker identification system: Feature extraction, description, and classification of speech-signal image”, IEEE Transactions on Industrial Electronics, Vol. 54, No. 2, pp. 887–897, 2007

D. Yongda, L. Fang, X. Huang, “Research on multimodal human-robot interaction based on speech and gesture”, Computers & Electrical Engineering, Vol. 72, pp. 443-454, 2018

J. Manikandan, B. Venkataramani, K. Girish, H. Karthic, V. Siddharth, “Hardware implementation of real-time speech recognition system using TMS320C6713 DSP”, 24th Internatioal Conference on VLSI Design, Chennai, India, January 2-7, 2011

C. C. Shen, W. Plishker, S. S. Bhattacharyya, “Design and optimization of a distributed, embedded speech recognition system”, IEEE International Symposium on Parallel and Distributed Processing, Miami, USA, April 14-18, 2008

B. Kamdar, B. Mirchandani, D. Shah, Y. S. Rao, “Real time speech recognition using IIR digital filters implemented on an embedded system”, International Conference on Communication, Information & Computing Technology, Mumbai, India, October 19-20, 2012

M. M. Da Silva, D. A. Evin, S. Verrastro, “Speaker-independent embedded speech recognition using Hidden Markov Models”, IEEE Conference on Computer Sciences, Buenos Aires, Argentina, November 30-December 2, 2016

J. Li, D. An, L. Lang, D. Yang, “Embedded speaker recognition system design and implementation based on FPGA”, Procedia Engineering, Vol. 29, pp. 2633-2637, 2012

G. Tamulevicius, V. Arminas, E. Ivanovas, D. Navakauskas, “Hardware Accelerated FPGA Implementation of Lithuanian Isolated Word Recognition System”, Elektronika Ir Elektrotechnika, Vol. 99, No. 3, pp. 57-62, 2010

M. A. A. Zulkifly,N. Yahya, “Relative spectral-perceptual linear prediction (RASTA-PLP) speech signals analysis using singular value decomposition (SVD)”, IEEE 3rd International Symposium in Robotics and Manufacturing Automation, Kuala Lumpur, Malaysia, September 19-21, 2017

H. Gupta, D. Gupta, “LPC and LPCC method of feature extraction in Speech Recognition System”, 6th International Conference - Cloud System and Big Data Engineering (Confluence), Noida, India, January 14-15, 2016

D. O’Shaugnessy, “Automatic speech recognition: History, methods and challenges”, Pattern Recognition, Vol. 41, No. 10, pp. 2965-2979, 2008

C. Kim, R. M. Stern, “Power-normalized cepstral coefficients (PNCC) for robust speech recognition”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 24, No. 7, pp. 1315-1329, 2016

S. Furui, “Cepstral analysis technique for automatic speaker verification”, IEEE Transactions On Acoustics, Speech, and Signal Processing, Vol. 29, No. 2, pp. 254-272, 1981

O. Viikki, K. Laurila, “Cepstral domain segmental feature vector normalization for noise robust speech recognition”, Speech Communication, Vol. 25, No. 1-3, pp. 133-147, 1998

S. Kim, M. Ji, H. Kim, “Noise-robust speaker recognition using subband likelihoods and reliable-feature selection”, ETRI Journal, Vol. 30, pp. 89-100, 2008

S. Okawa, E. Bocchieri, A. Potamianos, “Multi-band speech recognition in noisy environments”, IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, USA, May 15, 1998

W. C. Chen, C. T. Hsieh, E. Lai, “Multiband approach to robust text independent speaker identification”, Computational Linguistics and Chinese Language Processing, Vol. 9, No. 2, pp. 63-76, 2004

M. I. Abdalla, H. M. Abobakr, T. S. Gaafar, “DWT and MFCCs based Feature Extraction Methods for Isolated Word Recognition”, International Journal of Computer Applications, Vol. 69, No. 20, pp. 21-26, 2013

R. X. Gao, R. Yan, Wavelets Theory and Applications for Manufacturing, Springer, 2010

L. R. Rabiner, B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993

C. Hsu and C. Lin, “A comparison of methods for multiclass support vector machines”, IEEE Transactions on Neural Networks, Vol. 13, No. 2, pp. 415-425, 2001

N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and other Kernel-based Learning Methods, Cambridge University Press, 2000

A. Mnassri, M. Bennasr, A. Cherif “GA Algorithm Optimizing SVM Multi-Class Kernel Parameters Applied in Arabic Speech Recognition”, Indian Journal of Science and Technology, Vol. 10, No 27, pp. 1-9, 2017




eISSN: 1792-8036     pISSN: 2241-4487