Non-Stationary Speech Denoising via a Wavenet Architecture with Dilated Convolutions and Gated Units
Received: 6 February 2026 | Revised: 13 March 2026 | Accepted: 24 March 2026 | Online: 13 April 2026
Corresponding author: Pradeep Kumar Sriperamboodhuru
Abstract
Since the intelligibility of speech signals in speech communication systems can be affected by ambient noise, researchers have developed a number of methods to improve intelligibility. WaveNet is a promising deep learning model to overcome this constraint. WaveNet is a generative model that employs autoregression to produce the probability distribution of the subsequent sample based on fragments of the preceding sample. A supervised version of WaveNet, in which the model learns by minimizing regression loss, can be used to address speech denoising. The proposed model introduces noncausality, discrimination, target field prediction, and conditioning, improving computational efficiency by making the model highly parallelizable. Evaluations show that the proposed method performs better than classical methods, such as a commonly used method based on processing magnitude spectrograms. The proposed method yields higher SNR gains up to 19.23 dB and lower MCD values as low as 9.41, achieving promising results for speech denoising in non-stationary environments.
Keywords:
deep learning, WaveNet, speech denoising, end- to-end processingDownloads
References
K. U. Shajeesh, K. S. Sachin, D. Pravena, and K. P. Soman, "Speech Enhancement based on Savitzky–Golay Smoothing Filter," International Journal of Computer Applications, vol. 57, no. 21, pp. 39–44, Nov. 2012.
S. J. Lee and H. Y. Kwon, "A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection," Applied Sciences, vol. 10, no. 20, Oct. 2020, Art. no. 7385.
J. Benesty, J. Chen, Y. (Arden) Huang, and S. Doclo, "Study of the Wiener Filter for Noise Reduction," in Speech Enhancement, Springer-Verlag, 2005, pp. 9–41.
M. Tanveer et al., "Ensemble deep learning in speech signal tasks: A review," Neurocomputing, vol. 550, Sept. 2023, Art. no. 126436.
S. T. Yousif and B. M. Mahmmod, "Speech Enhancement Algorithms: A Systematic Literature Review," Algorithms, vol. 18, no. 5, May 2025, Art. no. 272.
S. P. Kumar and K. A. Sheela, "A DNN Based Adaptive Filter for Speech Enhancement," in 2024 Second International Conference on Data Science and Information System (ICDSIS), May 2024, pp. 1–5.
W. Yuan and B. Xia, "A speech enhancement approach based on noise classification," Applied Acoustics, vol. 96, pp. 11–19, Sept. 2015.
S. Kar and V. Mukherjee, "Acoustic Signal Enhancement Using Deep Neural Networks," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 24259–24264, Aug. 2025.
A. Mehrish, N. Majumder, R. Bharadwaj, R. Mihalcea, and S. Poria, "A review of deep learning techniques for speech processing," Information Fusion, vol. 99, Nov. 2023, Art. no. 101869.
D. Rethage, J. Pons, and X. Serra, "A Wavenet for Speech Denoising," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2018, pp. 5069–5073.
M. Lehekar and V. More, "Implementation of Speech Enhancement Algorithm on Hardware platform," in 2022 International Conference on Industry 4.0 Technology (I4Tech), Sept. 2022, pp. 1–4.
P. S. Rao and V. Sreelatha, "Implementation and Evaluation of Spectral Subtraction with Minimum Statistics using WOLA and FFT Modulated Filter Banks," M.S. Thesis, Blekinge Institute of Technology, Sweden, 2014.
M. M. Lynn and C. Su, "Speaker Independent and Text Independent Emotion Recognition System Based on Random Forest Classifier," International Journal of Innovative Research in Computer and Communication Engineering, vol. 6, no. 12, pp. 9259–9266, Dec. 2018.
E. Lai, "Time-domain representation of discrete-time signals and systems," in Practical Digital Signal Processing, Elsevier, 2003.
H. Sameti, H. Sheikhzadeh, L. Deng, and R. L. Brennan, "HMM-based strategies for enhancement of speech signals embedded in nonstationary noise," IEEE Transactions on Speech and Audio Processing, vol. 6, no. 5, pp. 445–455, Sept. 1998.
I. H. Sarker, "Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions," SN Computer Science, vol. 2, no. 6, Nov. 2021, Art. no. 420.
S. A. Nossier, J. Wall, M. Moniri, C. Glackin, and N. Cannings, "Mapping and Masking Targets Comparison using Different Deep Learning based Speech Enhancement Architectures," in 2020 International Joint Conference on Neural Networks (IJCNN), July 2020, pp. 1–8.
S. R. Park and J. Lee, "A Fully Convolutional Neural Network for Speech Enhancement." arXiv, 2016.
I. J. Goodfellow et al., "Generative Adversarial Nets," in Advances in Neural Information Processing Systems, 2014, vol. 27.
A. Wali et al., "Generative adversarial networks for speech processing: A review," Computer Speech & Language, vol. 72, Mar. 2022, Art. no. 101308.
H. R. Guimarães, H. Nagano, and D. W. Silva, "Monaural speech enhancement through deep wave-U-net," Expert Systems with Applications, vol. 158, Nov. 2020, Art. no. 113582.
C. Valentini-Botinhao, Noisy speech database for training speech enhancement algorithms and TTS models. University of Edinburgh. School of Informatics. Centre for Speech Technology Research (CSTR), 2017.
K. Zhao and Y. Zhong, "English Speech Distortion Detection and Repair Based on Deep Learning," in 3D Imaging Technologies and Deep Learning, 2025, pp. 145–156.
D. Rethage, "drethage/speech-denoising-wavenet." Mar. 26, 2026, [Online]. Available: https://github.com/drethage/speech-denoising-wavenet.
Downloads
How to Cite
License
Copyright (c) 2026 Pradeep Kumar Sriperamboodhuru, Anitha Sheela Kancharla

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
