帳號:guest(18.119.125.108)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):劉志容
作者(外文):Liou, Jyh-Jung
論文名稱(中文):基於注意力機制之噪聲感知噪音消除模型於時域語音增強
論文名稱(外文):An Attention-Based Noise-Aware Elimination Network for Speech Enhancement in the Time Domain
指導教授(中文):張世杰
指導教授(外文):Chang, Shih-Chieh
口試委員(中文):何宗易
潘家煜
口試委員(外文):Ho, Tsung-Yi
Pan, Jia-Yu
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:108062510
出版年(民國):111
畢業學年度:110
語文別:英文
論文頁數:22
中文關鍵詞:語音增強注意力機制神經網路
外文關鍵詞:speech enhancementattention mechanismneural network
相關次數:
  • 推薦推薦:0
  • 點閱點閱:143
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
近年來,深度神經網路在處理語音增強之任務有顯著的成功,由於在時域中進行語音增強,能夠同時將訊號相位及幅度譜進行除噪,這種在時域的方法研究也逐漸受到關注。
深度學習網路之於語音增強的方法大致可分為時域以及時頻域,在本文中,我們的目標是在時域方法下建立一個有效的深度學習模型,來更好的提高訊號的品質。由於時域的方法不需要額外的硬體資源來執行短時傅立葉轉換,藉此時域方法可以降低硬體方面的成本。基於編碼-解碼的網路框架,在本文中提出了一個創新的輔助網路,它提供了基於注意力機制的方法,既可以增強重要之訊號的重要程度,亦可抑制雜訊的程度。此注意力機制的模型,使網路在進行除噪、語音增強時能夠使網路中傳遞之潛在表徵更貼近乾淨之聲音訊號。
實驗表明,此文提出之方法,對於提高語音品質的評分有顯著的幫助,使用此文的方法,在PESQ、CSIG、CBAK和COVL的評分下都優於當前使用之時域語音增強模型。
Speech enhancement in the time domain has been gaining attention over recent years as a result of enhancing both the phase and the magnitude spectrum of speech signals. DNN approaches have shown great success in speech enhancement. DNN approaches can be classified into the time-frequency domain and the time-domain approach. In this paper, our objective is to build an efficient DNN model in the time-domain approach because the time-domain approaches do not need additional hardware to perform the short-time Fourier transform. Based on the encoder-decoder framework, this paper proposes an innovative auxiliary network that provides an attention-based approach. The attention-based auxiliary network can either strengthen important time steps or suppress unimportant time steps in each channel. The attention mechanism allows us to perform noise-aware suppression to make the latent representations closer to the clean speech. Experimental results show that our method outperforms the state-of-the-art model in the time domain in terms of PESQ, CSIG, CBAK, and COVL.
中文摘要
Abstract
Contents
List of Figures
List of Tables
1. Introduction.......................................1
2. Related Work.......................................5
2.1 DEMUCS Architecture............................5
3. The Proposed Framework.............................7
3.1 Noise-aware elimination network................7
3.2 System Overview................................7
3.3 Noise-aware Elimination Network Architecture...8
3.4 Objective......................................9
4. Dataset...........................................10
5. Experiments.......................................11
5.1 Evaluation Methods............................11
5.2 Experimental Setups...........................12
5.3 Experimental Results..........................12
6. Conclusions.......................................18
References...........................................19
[1] S. Boll. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2):113–120, 1979.
[2] Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier. Language modeling with gated convolutional networks. CoRR, abs/1612.08083, 2016.
[3] A. Defossez, G. Synnaeve, and Y. Adi. Real time speech enhancement in the waveform domain, 2020.
[4] A. D ́efossez, N. Usunier, L. Bottou, and F. R. Bach. Demucs: Deep extractor for music sources with extra unlabeled data remixed. CoRR, abs/1909.01174, 2019.
[5] S.-W. Fu, Y. Tsao, X. Lu, and H. Kawai. Raw waveform-based speech enhancement by fully convolutional networks, 2017.
[6] X. Hao, C. Shan, Y. Xu, S. Sun, and L. Xie. An attention-based neural network approach for single channel speech enhancement. In ICASSP 2019 - 2019 IEEE19 International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6895–6899, 2019.
[7] X. Hao, X. Su, R. Horaud, and X. Li. Fullsubnet: A full-band and sub-band fusion model for real-time single-channel speech enhancement, 2021.
[8] Y. Hu, Y. Liu, S. Lv, M. Xing, S. Zhang, Y. Fu, J. Wu, B. Zhang, and L. Xie. Dccrn: Deep complex convolution recurrent network for phase-aware speech enhancement, 2020.
[9] Y. Hu and P. C. Loizou. Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1):229–238, 2008.
[10] H. Levitt. Noise reduction in hearing aids: a review. Journal of rehabilitation research and development, 38 1:111–21, 2001.
[11] X. lu, Y. Tsao, S. Matsuda, and C. Hori. Speech enhancement based on deep denoising auto-encoder. Proc. Interspeech, pages 436–440, 01 2013.
[12] A. Narayanan and D. Wang. Computational auditory scene analysis and automatic speech recognition. Techniques for Noise Robustness in Automatic Speech Recognition, page 433–462, 2012.
[13] A. Pandey and D. Wang. A new framework for cnn-based speech enhancement in the time domain. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(7):1179–1188, 2019.
[14] C. K. A. Reddy, H. Dubey, K. Koishida, A. Nair, V. Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, and S. Srinivasan. Interspeech 2021 deep noise suppression challenge, 2021.
[15] A. Rix, J. Beerends, M. Hollier, and A. Hekstra. Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE International Conference on Acoustics, Speech, and Signal
Processing. Proceedings (Cat. No.01CH37221), volume 2, pages 749–752 vol.2, 2001.
[16] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597, 2015.
[17] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen. An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(7):2125–2136, 2011.
[18] C. Tang, C. Luo, Z. Zhao, W. Xie, and W. Zeng. Joint time-frequency and time domain learning for speech enhancement. In C. Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 3816–3822. International Joint Conferences on Artificial Intelligence Organization, 7 2020. Main track.
[19] C. Valentini-Botinhao. Noisy speech database for training speech enhancement algorithms and tts models. 2017.
[20] N. L. Westhausen and B. T. Meyer. Dual-signal transformation lstm network for real-time noise suppression, 2020.
[21] C. Zorila, C. Boeddeker, R. Doddipatla, and R. Haeb-Umbach. An investigation into the effectiveness of enhancement in asr training and test for chime-5 dinner party transcription, 2019
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *