Environmental Noise Reduction based on Deep Denoising Autoencoder

Authors

  • A. Azmat Institute of Human-Centered Computing, National Tsinghua University, Taiwan | Institute of Information Science, Academia Sinica, Taiwan
  • I. Ali Department of Computer Science, University of Swat, Pakistan
  • W. Ariyanti National Taiwan University of Science and Technology, Taiwan I Institute of Information Science, Academia Sinica, Taiwan
  • M. G. L. Putra National Taiwan University of Science and Technology, Taiwan I Institut Teknologi Kalimantan, Indonesia
  • T. Nadeem COMSATS University, Pakistan
Volume: 12 | Issue: 6 | Pages: 9532-9535 | December 2022 | https://doi.org/10.48084/etasr.5239

Abstract

Speech enhancement plays an important role in Automatic Speech Recognition (ASR) even though this task remains challenging in real-world scenarios of human-level performance. To cope with this challenge, an explicit denoising framework called Deep Denoising Autoencoder (DDAE) is introduced in this paper. The parameters of DDAE encoder and decoder are optimized based on the backpropagation criterion, where all denoising autoencoders are stacked up instead of recurrent connections. For better speech estimation in real and noisy environments, we include matched and mismatched noisy and clean pairs of speech data to train the DDAE. The DDAE has the ability to achieve optimal results even for a limited amount of training data. Our experimental results show that the proposed DDAE outperformed the baseline algorithms. The DDAE shows superior performances based on three-evaluation metrics in noisy and clean pairs of speech data compared to three baseline algorithms.

Keywords:

DDAE, limited data, noise reduction, autoencoders

Downloads

Download data is not yet available.

References

W. Helali, Ζ. Hajaiej, and A. Cherif, "Real time speech recognition based on PWP thresholding and MFCC using SVM,” Engineering, Technology & Applied Science Research, vol. 10, no. 5, pp. 6204-6208, Oct., 2020. DOI: https://doi.org/10.48084/etasr.3759

G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 30–42, Jan. 2012. DOI: https://doi.org/10.1109/TASL.2011.2134090

X. Lu, M. Unoki, S. Matsuda, C. Hori, and H. Kashioka, "Controlling Tradeoff Between Approximation Accuracy and Complexity of a Smooth Function in a Reproducing Kernel Hilbert Space for Noise Reduction,” IEEE Transactions on Signal Processing, vol. 61, no. 3, pp. 601–610, Oct. 2013. DOI: https://doi.org/10.1109/TSP.2012.2229991

Y. Bengio, "Learning Deep Architectures for AI,” Foundations and Trends® in Machine Learning, vol. 2, no. 1, pp. 1–127, Nov. 2009. DOI: https://doi.org/10.1561/2200000006

A. A. Alasadi, T. H. Aldhayni, R. R. Deshmukh, A. H. Alahmadi, and A. S. Alshebami, "Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System,” Engineering, Technology & Applied Science Research, vol. 10, no. 2, pp. 5547–5553, Apr. 2020. DOI: https://doi.org/10.48084/etasr.3465

A. Samad, A. U. Rehman, and S. A. Ali, "Performance Evaluation of Learning Classifiers of Children Emotions using Feature Combinations in the Presence of Noise,” Engineering, Technology & Applied Science Research, vol. 9, no. 6, pp. 5088–5092, Dec. 2019. DOI: https://doi.org/10.48084/etasr.3193

M. Ranzato, F. J. Huang, Y.-L. Boureau, and Y. LeCun, "Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, Jun. 2007. DOI: https://doi.org/10.1109/CVPR.2007.383157

A. L. Maas, Q. V. Le, T. M. O’Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, "Recurrent Neural Networks for Noise Reduction in Robust ASR,” in Proceedings of The International Conference on Acoustics, Speech, & Signal Processing, Dec. 2012. DOI: https://doi.org/10.21437/Interspeech.2012-6

Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443–445, Apr. 1985. DOI: https://doi.org/10.1109/TASSP.1985.1164550

U. Mittal and N. Phamdo, "Signal/noise KLT based approach for enhancing speech degraded by colored noise,” IEEE Transactions on Speech and Audio Processing, vol. 8, no. 2, pp. 159–167, Mar. 2000. DOI: https://doi.org/10.1109/89.824700

E. J. Candès, X. Li, Y. Ma, and J. Wright, "Robust principal component analysis?,” Journal of the ACM, vol. 58, no. 3, pp. 11:1-11:37, Mar. 2011. DOI: https://doi.org/10.1145/1970392.1970395

X. Lu, Y. Tsao, S. Matsuda, and C. Hori, "Speech enhancement based on deep denoising autoencoder,” in Interspeech 2013, Aug. 2013, pp. 436–440. DOI: https://doi.org/10.21437/Interspeech.2013-130

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, "Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion,” The Journal of Machine Learning Research, vol. 11, pp. 3371–3408, Sep. 2010.

L. L. N. Wong, S. D. Soli, S. Liu, N. Han, and M.-W. Huang, "Development of the Mandarin Hearing in Noise Test (MHINT),” Ear and Hearing, vol. 28, no. 2 Suppl, pp. 70S-74S, Apr. 2007. DOI: https://doi.org/10.1097/AUD.0b013e31803154d0

Downloads

How to Cite

[1]
A. Azmat, I. Ali, W. Ariyanti, M. G. L. Putra, and T. Nadeem, “Environmental Noise Reduction based on Deep Denoising Autoencoder”, Eng. Technol. Appl. Sci. Res., vol. 12, no. 6, pp. 9532–9535, Dec. 2022.

Metrics

Abstract Views: 655
PDF Downloads: 617

Metrics Information

Most read articles by the same author(s)