Non-Stationary Speech Denoising via a Wavenet Architecture with Dilated Convolutions and Gated Units

Pradeep Kumar Sriperamboodhuru; Anitha Sheela Kancharla

doi:10.48084/etasr.18019

Authors

Pradeep Kumar Sriperamboodhuru Jawaharlal Nehru Technological University Hyderabad, India
Anitha Sheela Kancharla Jawaharlal Nehru Technological University Hyderabad, India

Volume: 16 | Issue: 3 | Pages: 34934-34941 | June 2026 | https://doi.org/10.48084/etasr.18019

Received: 6 February 2026 | Revised: 13 March 2026 | Accepted: 24 March 2026 | Online: 13 April 2026

Corresponding author: Pradeep Kumar Sriperamboodhuru

Abstract

Since the intelligibility of speech signals in speech communication systems can be affected by ambient noise, researchers have developed a number of methods to improve intelligibility. WaveNet is a promising deep learning model to overcome this constraint. WaveNet is a generative model that employs autoregression to produce the probability distribution of the subsequent sample based on fragments of the preceding sample. A supervised version of WaveNet, in which the model learns by minimizing regression loss, can be used to address speech denoising. The proposed model introduces noncausality, discrimination, target field prediction, and conditioning, improving computational efficiency by making the model highly parallelizable. Evaluations show that the proposed method performs better than classical methods, such as a commonly used method based on processing magnitude spectrograms. The proposed method yields higher SNR gains up to 19.23 dB and lower MCD values as low as 9.41, achieving promising results for speech denoising in non-stationary environments.

Keywords:

deep learning, WaveNet, speech denoising, end- to-end processing

References

K. U. Shajeesh, K. S. Sachin, D. Pravena, and K. P. Soman, "Speech Enhancement based on Savitzky–Golay Smoothing Filter," International Journal of Computer Applications, vol. 57, no. 21, pp. 39–44, Nov. 2012.

S. J. Lee and H. Y. Kwon, "A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection," Applied Sciences, vol. 10, no. 20, Oct. 2020, Art. no. 7385.

J. Benesty, J. Chen, Y. (Arden) Huang, and S. Doclo, "Study of the Wiener Filter for Noise Reduction," in Speech Enhancement, Springer-Verlag, 2005, pp. 9–41.

M. Tanveer et al., "Ensemble deep learning in speech signal tasks: A review," Neurocomputing, vol. 550, Sept. 2023, Art. no. 126436.

S. T. Yousif and B. M. Mahmmod, "Speech Enhancement Algorithms: A Systematic Literature Review," Algorithms, vol. 18, no. 5, May 2025, Art. no. 272.

S. P. Kumar and K. A. Sheela, "A DNN Based Adaptive Filter for Speech Enhancement," in 2024 Second International Conference on Data Science and Information System (ICDSIS), May 2024, pp. 1–5.

W. Yuan and B. Xia, "A speech enhancement approach based on noise classification," Applied Acoustics, vol. 96, pp. 11–19, Sept. 2015.

S. Kar and V. Mukherjee, "Acoustic Signal Enhancement Using Deep Neural Networks," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 24259–24264, Aug. 2025.

A. Mehrish, N. Majumder, R. Bharadwaj, R. Mihalcea, and S. Poria, "A review of deep learning techniques for speech processing," Information Fusion, vol. 99, Nov. 2023, Art. no. 101869.

D. Rethage, J. Pons, and X. Serra, "A Wavenet for Speech Denoising," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2018, pp. 5069–5073.

M. Lehekar and V. More, "Implementation of Speech Enhancement Algorithm on Hardware platform," in 2022 International Conference on Industry 4.0 Technology (I4Tech), Sept. 2022, pp. 1–4.

P. S. Rao and V. Sreelatha, "Implementation and Evaluation of Spectral Subtraction with Minimum Statistics using WOLA and FFT Modulated Filter Banks," M.S. Thesis, Blekinge Institute of Technology, Sweden, 2014.

M. M. Lynn and C. Su, "Speaker Independent and Text Independent Emotion Recognition System Based on Random Forest Classifier," International Journal of Innovative Research in Computer and Communication Engineering, vol. 6, no. 12, pp. 9259–9266, Dec. 2018.

E. Lai, "Time-domain representation of discrete-time signals and systems," in Practical Digital Signal Processing, Elsevier, 2003.

H. Sameti, H. Sheikhzadeh, L. Deng, and R. L. Brennan, "HMM-based strategies for enhancement of speech signals embedded in nonstationary noise," IEEE Transactions on Speech and Audio Processing, vol. 6, no. 5, pp. 445–455, Sept. 1998.

I. H. Sarker, "Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions," SN Computer Science, vol. 2, no. 6, Nov. 2021, Art. no. 420.

S. A. Nossier, J. Wall, M. Moniri, C. Glackin, and N. Cannings, "Mapping and Masking Targets Comparison using Different Deep Learning based Speech Enhancement Architectures," in 2020 International Joint Conference on Neural Networks (IJCNN), July 2020, pp. 1–8.

S. R. Park and J. Lee, "A Fully Convolutional Neural Network for Speech Enhancement." arXiv, 2016.

I. J. Goodfellow et al., "Generative Adversarial Nets," in Advances in Neural Information Processing Systems, 2014, vol. 27.

A. Wali et al., "Generative adversarial networks for speech processing: A review," Computer Speech & Language, vol. 72, Mar. 2022, Art. no. 101308.

H. R. Guimarães, H. Nagano, and D. W. Silva, "Monaural speech enhancement through deep wave-U-net," Expert Systems with Applications, vol. 158, Nov. 2020, Art. no. 113582.

C. Valentini-Botinhao, Noisy speech database for training speech enhancement algorithms and TTS models. University of Edinburgh. School of Informatics. Centre for Speech Technology Research (CSTR), 2017.

K. Zhao and Y. Zhong, "English Speech Distortion Detection and Repair Based on Deep Learning," in 3D Imaging Technologies and Deep Learning, 2025, pp. 145–156.

D. Rethage, "drethage/speech-denoising-wavenet." Mar. 26, 2026, [Online]. Available: https://github.com/drethage/speech-denoising-wavenet.