|
[1] S. Boll. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2):113–120, 1979. [2] Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier. Language modeling with gated convolutional networks. CoRR, abs/1612.08083, 2016. [3] A. Defossez, G. Synnaeve, and Y. Adi. Real time speech enhancement in the waveform domain, 2020. [4] A. D ́efossez, N. Usunier, L. Bottou, and F. R. Bach. Demucs: Deep extractor for music sources with extra unlabeled data remixed. CoRR, abs/1909.01174, 2019. [5] S.-W. Fu, Y. Tsao, X. Lu, and H. Kawai. Raw waveform-based speech enhancement by fully convolutional networks, 2017. [6] X. Hao, C. Shan, Y. Xu, S. Sun, and L. Xie. An attention-based neural network approach for single channel speech enhancement. In ICASSP 2019 - 2019 IEEE19 International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6895–6899, 2019. [7] X. Hao, X. Su, R. Horaud, and X. Li. Fullsubnet: A full-band and sub-band fusion model for real-time single-channel speech enhancement, 2021. [8] Y. Hu, Y. Liu, S. Lv, M. Xing, S. Zhang, Y. Fu, J. Wu, B. Zhang, and L. Xie. Dccrn: Deep complex convolution recurrent network for phase-aware speech enhancement, 2020. [9] Y. Hu and P. C. Loizou. Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1):229–238, 2008. [10] H. Levitt. Noise reduction in hearing aids: a review. Journal of rehabilitation research and development, 38 1:111–21, 2001. [11] X. lu, Y. Tsao, S. Matsuda, and C. Hori. Speech enhancement based on deep denoising auto-encoder. Proc. Interspeech, pages 436–440, 01 2013. [12] A. Narayanan and D. Wang. Computational auditory scene analysis and automatic speech recognition. Techniques for Noise Robustness in Automatic Speech Recognition, page 433–462, 2012. [13] A. Pandey and D. Wang. A new framework for cnn-based speech enhancement in the time domain. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(7):1179–1188, 2019. [14] C. K. A. Reddy, H. Dubey, K. Koishida, A. Nair, V. Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, and S. Srinivasan. Interspeech 2021 deep noise suppression challenge, 2021. [15] A. Rix, J. Beerends, M. Hollier, and A. Hekstra. Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), volume 2, pages 749–752 vol.2, 2001. [16] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597, 2015. [17] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen. An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(7):2125–2136, 2011. [18] C. Tang, C. Luo, Z. Zhao, W. Xie, and W. Zeng. Joint time-frequency and time domain learning for speech enhancement. In C. Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 3816–3822. International Joint Conferences on Artificial Intelligence Organization, 7 2020. Main track. [19] C. Valentini-Botinhao. Noisy speech database for training speech enhancement algorithms and tts models. 2017. [20] N. L. Westhausen and B. T. Meyer. Dual-signal transformation lstm network for real-time noise suppression, 2020. [21] C. Zorila, C. Boeddeker, R. Doddipatla, and R. Haeb-Umbach. An investigation into the effectiveness of enhancement in asr training and test for chime-5 dinner party transcription, 2019 |