Acoustic Signal Enhancement Using Deep Neural Networks

Authors

  • Shibani Kar Department of Electronics and Communication Engineering, Sambalpur University Institute of Information Technology, Sambalpur University, Jyoti Vihar, Burla, Odisha, India
  • Vishwajeet Mukherjee Department of Basic Science and Humanities, Sambalpur University Institute of Information Technology, Sambalpur University, Jyoti Vihar, Burla, Odisha, India
Volume: 15 | Issue: 4 | Pages: 24259-24264 | August 2025 | https://doi.org/10.48084/etasr.10571

Abstract

The presence of background noise in acoustic signals, such as speech, audio, and sound signals, degrades listening quality and causes hearing fatigue to the listener. Standard methods offer better signal enhancement under high SNR conditions. Deep neural networks employed in image processing and speech recognition have demonstrated significant performance improvements. This motivates the usage of deep neural networks for denoising speech signals corrupted with multiple noises under low SNR conditions (0 dB). This study applied two different types of deep neural networks, convolutional neural networks and deep generative networks, to remove background noise from speech signals under low SNR conditions. The noise reduction networks were trained to estimate the noise signal present, which was then subtracted to obtain the denoised speech signal. Two convolutional neural network architectures, the UNet and the Convolutional Encoder-Decoder network (CED), and two deep generative networks, Vector Quantized Variational Autoencoders (VQVAE) and Variational Autoencoders (VAE), were trained on STFT magnitude features of noisy signal frames. Four objective quality measures were used to determine the quality of the enhanced speech, namely Perceptual Evaluation of Speech Quality (PESQ), Short Time Objective Intelligibility (STOI), Segmental Signal to Noise Ratio (SSNR), and improvement in SNR. Spectral subtraction and logMMSE methods were used to evaluate the performance of these networks in two datasets. The results of the comparative analysis support the superiority of CED for signal denoising and enhancement of speech signals for multiple noises under low SNR conditions, with a much smaller number of model parameters compared to other methods for both seen and unseen noise conditions. 

Keywords:

background noise estimation, speech signal, deep neural networks, deep generative networks

Downloads

Download data is not yet available.

References

S. R. Park and J. W. Lee, "A Fully Convolutional Neural Network for Speech Enhancement," in Interspeech 2017, Aug. 2017, pp. 1993–1997. DOI: https://doi.org/10.21437/Interspeech.2017-1465

N. Krishnamurthy and J. H. L. Hansen, "Babble Noise: Modeling, Analysis, and Applications," IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 7, pp. 1394–1407, Sep. 2009. DOI: https://doi.org/10.1109/TASL.2009.2015084

S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 2, pp. 113–120, Apr. 1979. DOI: https://doi.org/10.1109/TASSP.1979.1163209

Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443–445, Apr. 1985. DOI: https://doi.org/10.1109/TASSP.1985.1164550

K. Paliwal, K. Wójcicki, and B. Schwerin, "Single-channel speech enhancement using spectral subtraction in the short-time modulation domain," Speech Communication, vol. 52, no. 5, pp. 450–475, May 2010. DOI: https://doi.org/10.1016/j.specom.2010.02.004

Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 6, pp. 1109–1121, Dec. 1984. DOI: https://doi.org/10.1109/TASSP.1984.1164453

D. E. Tsoukalas, J. N. Mourjopoulos, and G. Kokkinakis, "Speech enhancement based on audible noise suppression," IEEE Transactions on Speech and Audio Processing, vol. 5, no. 6, pp. 497–514, Nov. 1997. DOI: https://doi.org/10.1109/89.641296

N. Virag, "Single channel speech enhancement based on masking properties of the human auditory system," IEEE Transactions on Speech and Audio Processing, vol. 7, no. 2, pp. 126–137, Mar. 1999. DOI: https://doi.org/10.1109/89.748118

Y. Hu and P. C. Loizou, "A comparative intelligibility study of single-microphone noise reduction algorithms," The Journal of the Acoustical Society of America, vol. 122, no. 3, pp. 1777–1786, Sep. 2007. DOI: https://doi.org/10.1121/1.2766778

Y. Hu and P. C. Loizou, "Subjective comparison and evaluation of speech enhancement algorithms," Speech Communication, vol. 49, no. 7–8, pp. 588–601, Jul. 2007. DOI: https://doi.org/10.1016/j.specom.2006.12.006

A. Azmat, I. Ali, W. Ariyanti, M. G. L. Putra, and T. Nadeem, "Environmental Noise Reduction based on Deep Denoising Autoencoder," Engineering, Technology & Applied Science Research, vol. 12, no. 6, pp. 9532–9535, Dec. 2022. DOI: https://doi.org/10.48084/etasr.5239

N. Alamdari, A. Azarang, and N. Kehtarnavaz, "Improving deep speech denoising by Noisy2Noisy signal mapping," Applied Acoustics, vol. 172, Jan. 2021, Art. no. 107631. DOI: https://doi.org/10.1016/j.apacoust.2020.107631

V. Srinivasarao and U. Ghanekar, "Speech enhancement - an enhanced principal component analysis (EPCA) filter approach," Computers & Electrical Engineering, vol. 85, Jul. 2020, Art. no. 106657. DOI: https://doi.org/10.1016/j.compeleceng.2020.106657

D. S. Williamson and D. Wang, "Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 7, pp. 1492–1501, Jul. 2017. DOI: https://doi.org/10.1109/TASLP.2017.2696307

Y. Xu, J. Du, L. R. Dai, and C. H. Lee, "An Experimental Study on Speech Enhancement Based on Deep Neural Networks," IEEE Signal Processing Letters, vol. 21, no. 1, pp. 65–68, Jan. 2014. DOI: https://doi.org/10.1109/LSP.2013.2291240

D. Wang and J. Chen, "Supervised Speech Separation Based on Deep Learning: An Overview," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 10, pp. 1702–1726, Oct. 2018. DOI: https://doi.org/10.1109/TASLP.2018.2842159

S. Pascual, J. Serrà, and A. Bonafonte, "Time-domain speech enhancement using generative adversarial networks," Speech Communication, vol. 114, pp. 10–21, Nov. 2019. DOI: https://doi.org/10.1016/j.specom.2019.09.001

A. Azarang and N. Kehtarnavaz, "A review of multi-objective deep learning speech denoising methods," Speech Communication, vol. 122, pp. 1–10, Sep. 2020. DOI: https://doi.org/10.1016/j.specom.2020.04.002

C. Valentini-Botinhao, "Noisy speech database for training speech enhancement algorithms and TTS models." University of Edinburgh. School of Informatics. Centre for Speech Technology Research (CSTR), 2017.

Y. Hu and P. C. Loizou, "Evaluation of Objective Quality Measures for Speech Enhancement," IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 1, pp. 229–238, Jan. 2008. DOI: https://doi.org/10.1109/TASL.2007.911054

X. Dong and D. S. Williamson, "Towards real-world objective speech quality and intelligibility assessment using speech-enhancement residuals and convolutional long short-term memory networks," The Journal of the Acoustical Society of America, vol. 148, no. 5, pp. 3348–3359, Nov. 2020. DOI: https://doi.org/10.1121/10.0002702

C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "A short-time objective intelligibility measure for time-frequency weighted noisy speech," in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, Mar. 2010, pp. 4214–4217. DOI: https://doi.org/10.1109/ICASSP.2010.5495701

O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, vol. 9351, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds. Springer International Publishing, 2015, pp. 234–241. DOI: https://doi.org/10.1007/978-3-319-24574-4_28

D. P. Kingma and M. Welling, "An Introduction to Variational Autoencoders," Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307–392, 2019. DOI: https://doi.org/10.1561/2200000056

A. Van Den Oord, O. Vinyals, and Koray Kavukcuoglu, "Neural Discrete Representation Learning," in Advances in Neural Information Processing Systems, 2017, vol. 30.

Downloads

How to Cite

[1]
S. Kar and V. Mukherjee, “Acoustic Signal Enhancement Using Deep Neural Networks”, Eng. Technol. Appl. Sci. Res., vol. 15, no. 4, pp. 24259–24264, Aug. 2025.

Metrics

Abstract Views: 518
PDF Downloads: 415

Metrics Information