Environmental Noise Reduction based on Deep Denoising Autoencoder
Received: 4 August 2022 | Revised: 31 August 2022 | Accepted: 1 September 2022 | Online: 27 September 2022
Corresponding author: A. Azmat
Abstract
Speech enhancement plays an important role in Automatic Speech Recognition (ASR) even though this task remains challenging in real-world scenarios of human-level performance. To cope with this challenge, an explicit denoising framework called Deep Denoising Autoencoder (DDAE) is introduced in this paper. The parameters of DDAE encoder and decoder are optimized based on the backpropagation criterion, where all denoising autoencoders are stacked up instead of recurrent connections. For better speech estimation in real and noisy environments, we include matched and mismatched noisy and clean pairs of speech data to train the DDAE. The DDAE has the ability to achieve optimal results even for a limited amount of training data. Our experimental results show that the proposed DDAE outperformed the baseline algorithms. The DDAE shows superior performances based on three-evaluation metrics in noisy and clean pairs of speech data compared to three baseline algorithms.
Keywords:
DDAE, limited data, noise reduction, autoencodersDownloads
References
W. Helali, Ζ. Hajaiej, and A. Cherif, "Real time speech recognition based on PWP thresholding and MFCC using SVM,” Engineering, Technology & Applied Science Research, vol. 10, no. 5, pp. 6204-6208, Oct., 2020. DOI: https://doi.org/10.48084/etasr.3759
G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 30–42, Jan. 2012. DOI: https://doi.org/10.1109/TASL.2011.2134090
X. Lu, M. Unoki, S. Matsuda, C. Hori, and H. Kashioka, "Controlling Tradeoff Between Approximation Accuracy and Complexity of a Smooth Function in a Reproducing Kernel Hilbert Space for Noise Reduction,” IEEE Transactions on Signal Processing, vol. 61, no. 3, pp. 601–610, Oct. 2013. DOI: https://doi.org/10.1109/TSP.2012.2229991
Y. Bengio, "Learning Deep Architectures for AI,” Foundations and Trends® in Machine Learning, vol. 2, no. 1, pp. 1–127, Nov. 2009. DOI: https://doi.org/10.1561/2200000006
A. A. Alasadi, T. H. Aldhayni, R. R. Deshmukh, A. H. Alahmadi, and A. S. Alshebami, "Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System,” Engineering, Technology & Applied Science Research, vol. 10, no. 2, pp. 5547–5553, Apr. 2020. DOI: https://doi.org/10.48084/etasr.3465
A. Samad, A. U. Rehman, and S. A. Ali, "Performance Evaluation of Learning Classifiers of Children Emotions using Feature Combinations in the Presence of Noise,” Engineering, Technology & Applied Science Research, vol. 9, no. 6, pp. 5088–5092, Dec. 2019. DOI: https://doi.org/10.48084/etasr.3193
M. Ranzato, F. J. Huang, Y.-L. Boureau, and Y. LeCun, "Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, Jun. 2007. DOI: https://doi.org/10.1109/CVPR.2007.383157
A. L. Maas, Q. V. Le, T. M. O’Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, "Recurrent Neural Networks for Noise Reduction in Robust ASR,” in Proceedings of The International Conference on Acoustics, Speech, & Signal Processing, Dec. 2012. DOI: https://doi.org/10.21437/Interspeech.2012-6
Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443–445, Apr. 1985. DOI: https://doi.org/10.1109/TASSP.1985.1164550
U. Mittal and N. Phamdo, "Signal/noise KLT based approach for enhancing speech degraded by colored noise,” IEEE Transactions on Speech and Audio Processing, vol. 8, no. 2, pp. 159–167, Mar. 2000. DOI: https://doi.org/10.1109/89.824700
E. J. Candès, X. Li, Y. Ma, and J. Wright, "Robust principal component analysis?,” Journal of the ACM, vol. 58, no. 3, pp. 11:1-11:37, Mar. 2011. DOI: https://doi.org/10.1145/1970392.1970395
X. Lu, Y. Tsao, S. Matsuda, and C. Hori, "Speech enhancement based on deep denoising autoencoder,” in Interspeech 2013, Aug. 2013, pp. 436–440. DOI: https://doi.org/10.21437/Interspeech.2013-130
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, "Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion,” The Journal of Machine Learning Research, vol. 11, pp. 3371–3408, Sep. 2010.
L. L. N. Wong, S. D. Soli, S. Liu, N. Han, and M.-W. Huang, "Development of the Mandarin Hearing in Noise Test (MHINT),” Ear and Hearing, vol. 28, no. 2 Suppl, pp. 70S-74S, Apr. 2007. DOI: https://doi.org/10.1097/AUD.0b013e31803154d0
Downloads
How to Cite
License
Copyright (c) 2022 A. Azmat, I. Ali, W. Ariyanti, M. G. L. Putra, T. Nadeem
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.