帳號:guest(3.137.178.178)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳玟銓
作者(外文):Chen, Wen-Chuan
論文名稱(中文):應用深度學習於惡劣環境下的語音及音頻訊號處理
論文名稱(外文):Deep Learning Applied to Speech and Audio Signal Processing under Adverse Environments
指導教授(中文):白明憲
指導教授(外文):Bai, Ming-Sian
口試委員(中文):簡仁宗
丁川康
口試委員(外文):Chien, Jen-Tzung
Ting, Chuan-Kang
學位類別:碩士
校院名稱:國立清華大學
系所名稱:動力機械工程學系
學號:106033548
出版年(民國):108
畢業學年度:108
語文別:英文
論文頁數:77
中文關鍵詞:去迴響多通道反向濾波器多通道維納濾波器多通道線性預估事件音偵測聲源分離深度卷積遞歸神經網路複數遮罩區分性訓練
外文關鍵詞:DereverberationMultiple Input/output inverse filteringMultichannel wiener filterVariance-normalized delayed linear predictionSound Event DetectionSource SeparationConvolutional recurrent neural networkIdeal complex maskDiscriminative training
相關次數:
  • 推薦推薦:0
  • 點閱點閱:246
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
本篇論文探討如何在惡劣的聲學環境下處理語音及音頻訊號,特別是迴響的環境。首先,許多去迴響的演算法包括傳統與神經網路的演算法在第二章進行比較,結果顯示神經網路的演算法效能要比傳統來得好。接著在第三章,我們探討如何在惡劣環境下進行事件音的偵測(Sound Event Detection)並且比較兩種方法(兩階段與一階段演算法),結果顯示一階段的算法效果比兩階段的算法來得好。在第四章,我們緊接著探討如何進行聲源分離,並且提出一個深度卷積遞歸神經網路(Convolutional Recurrent Neural Network)輔以複數遮罩(Ideal Complex Mask)和區分性訓練(Discriminative Training)來同時處理事件音分離與偵測(Sound Event Separation and Detection)的問題。最後為了證明我們提出的方法有助於演算法在惡劣環境下的強健性,我們使用三種不同的資料集包括語音、音樂及事件音來做為我們的測試集,我們利用聲學陣列的模型與鏡像聲源算法(Image Source Method)來模擬聲波的傳遞,此外對於事件音分離的問題我們採用SIR、SDR和 SAR做為評估指標,而事件音偵測的問題我們採用錯誤率(Error Rate)、F-score和AUC做為評估指標。最後的結果顯示我們所提出的模型能夠有效地在惡劣環境下進行事件音的分離與偵測。
This paper describes problems and solution regarding the audio signal processing under adverse environments, especially acoustically reverberant environments. First, in the chapter two, different dereverberation algorithms are introduced and compared. The results show that deep neural network (DNN) based method outperforms others by the perceptual evaluation of speech quality (PESQ) metrics.
Second, in the chapter three, methods for sound event detection (SED) under reverberant environments is described. The two-stage methods which preprocess the signals using dereverberation algorithms are compared with a one-stage, or more generally, end-to-end method without any preprocessing algorithm. The results reveal that end-to-end methods beat two-stage methods by the F-score classification metrics.
Third, based on the experiments in the chapter two, an end-to-end polyphonic sound event separation and detection (SESD) system that aims to separate and detect polyphonic sound events jointly is proposed in the chapter four. The system consists of a convolutional recurrent neural network (CRNN) for SESD in conjunction with an ideal complex mask (ICM). In addition, a discriminative training (DT) network is also used to increase the robustness of the system for acoustically adverse environments. Multichannel magnitude and phase spectra are employed as the input features. Next, sound event separation (SES) is performed by using a complex mask, whereas SED is performed by averaging out the embedded features along the time axis and feeding it into DNN. DT is conducted to maximize the separability of event signals.
Last, the SES and SED performance are evaluated on three remixed datasets: isolated sound event, speech, and singing voice. The dataset is added with background noise and reverberation to examine the resilience of the proposed network to adverse factors. The results of an SES task have shown that the proposed method outperforms four baseline methods in terms of source to interference ratio (SIR), source to artifacts ratio (SAR), and source to distortion ratio (SDR). The results of an SED task have demonstrated the efficacy of the proposed method over two baseline methods in terms of error rate (ER), F-score, and area under the curve (AUC).
摘要 ii
ABSTRACT iii
LIST OF TABLES vii
LIST OF FIGURES viii
CHAPTER 1 INTRODUCTION 1
1.Dereverberation 1
2.Sound Event Detection (SED) 3
3.Joint Sound Event Separation and Detection 4
4.Contribution of This Paper 8
CHAPTER 2 DEREVERBERATION 10
1.Methods for Dereverberation 10
1.1 Multiple-input/output Inverse Theorem 10
1.2 Multichannel Wiener Filter 11
1.3 Normalized Delayed Multichannel Linear Prediction 13
1.4 DNN-based Dereverberation 15
2.Evaluation and Discussion 16
2.1 Acoustic Environments 16
2.2 Results and Discussion 17
CHAPTER 3 SOUND EVENT DETECTION 19
1.Methods for Sound Event Detection 19
1.1 Feature Extraction 19
1.2 Network Architecture 19
2.Evaluation and Discussion 20
2.1 Sound Event Datasets 20
2.2 Results and Discussion 20
CHAPTER 4 JOINT SOUND EVENT SEPARATION AND DETECTION 22
1.Methods for Sound Event Separation and Detection 22
1.1 Feature Extraction 22
1.2 Convolutional Recurrent Neural Network 22
1.3 Ideal Complex Time-Frequency Mask 25
1.4 Training Objectives 29
2.Evaluation Datasets 30
2.1 Acoustic Environments 30
2.2 Speech Separation Dataset 31
2.3 Singing Voice Separation Dataset 31
2.4 Isolated Sound Event Dataset 32
3.Evaluation Metrics 33
3.1 Metrics for SES 33
3.2 Metrics for SED 34
4.Baseline Methods and Model Settings 35
4.1 SES Baseline Methods 35
4.2 SED Baseline Methods 36
4.3 Model Settings 37
5.Results and Discussion 38
5.1 Results of SES 38
5.2 Results of SED 39
5.3 Model Settings 40
CHAPTER 5 CONCLUSIONS 43
REFERENCES 45
TABLES 59
FIGURES 69
[1] P. S. Huang, S. D. Chen, P. Smaragdis, and M. Hasegawa-Johnson, “Singing-voice separation from monaural recordings using robust prin-cipal component analysis,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2012, pp. 57–60.
[2] P. Sprechmann, A. Bronstein, and G. Sapiro, “Real-time online singing voice separation from monaural recordings using robust low-rank modeling,” in Proc. 13th Int. Soc. Music Inf. Retrieval (ISMIR), 2012.
[3] A. L. Maas, Q. V. Le, T. M. O’Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, “Recurrent neural networks for noise reduction in robust ASR,” in Proc. 13th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), 2012, pp. 22–25.
[4] B. Chen, C. Chen, and J. Wang, "Smart Homecare Surveillance System: Behavior Identification Based on State-Transition Support Vector Machines and Sound Directivity Pattern Analysis," in IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 43, no. 6, pp. 1279-1289, Nov. 2013.
[5] N. Yalta, K. Nakadai, and T. Ogata, “Sound source localization using deep learning models,” in Journal of Robotics and Mechatronics, vol. 29, no. 1, 2017.
[6] C. Grobler, C. Kruger, B. Silva, and G. Hancke, “Sound based localization and identification in industrial environments,” in IEEE Industrial Electronics Society (IECON), 2017.
[7] S. Haykin, “Adaptive Filter Theory, 3rd ed.,” NJ: Prentice-Hall, 1996.
[8] M. Miyoshi and Y. Kaneda, Inverse filtering of room acoustics, IEEE Trans.
Acoustics, Speech, and Signal Processing, 36(2), 145-152, 1988.
[9] I. Kodrasi, S. Goetze, and S. Docolo, Regularization for Partial Multichannel Equalization for Speech Dereverberation, IEEE Trans. Audio, Speech, and Language Processing, 21(9), 1879-1890, 2013.
[10] J. Benesty, J. Chen, and Y. Huang, Microphone Array Signal Processing, Chap.7, Springer, Berlin, 2008.
[11] I Kodrasi, S Doclo, Late Reverberation Power Spectral Density Estimation Based on An Eigenvalue Decomposition, Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), New Orleans, 5 pages, 2017.
[12] K. Kinoshita, T. Nakatani, and M. Miyoshi, Suppression of late rever- beration effect on speech signal using long-term multiple-step linear prediction, IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 4, pp. 534–545, 2009.
[13] Tomohiro Nakatani, Takuya Yoshiok, Keisuke Kinoshita, Masato Miyoshi and Biing-Hwang Juang, Speech Dereverberation Based on Variance- Normalized Delayed Linear Prediction, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, 2010.
[14] Teng Xiang, Jing Lu and Kai Chen, Multi-channel Adaptive Dereverberation Tracing Abrupt Position Change of Target Speaker, 2018.
[15] Han, Y Wang, DL Wang, et al, Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans. Audio Speech Lang. Process. 23(6), 982–992, 2015.
[16] K Han, Y Wang, DL Wang, in ICASSP. Learning spectral mapping for speech dereverberation, 2014.
[17] B Wu, K Li, ML Yang, C-H Lee, A reverberation-time-aware approach to speech dereverberation based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(1), 102–11, 2017.
[18] Wu, Bo & Yang, Minglei & Li, Kehuang & Huang, Zhen & Siniscalchi, Marco & Wang, Tong & Lee, Chin-Hui, A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation, EURASIP Journal on Advances in Signal Processing, 2017.
[19] Rec. P. 862 ITU-T, Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Int. Telecommun. Union-Telecommun. Stand. Sector, 2001.
[20] J. Wang, J. Wang, K. He and, C. Hsu, "Environmental Sound Classification using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor," The 2006 IEEE International Joint Conference on Neural Network Proceedings, Vancouver, BC, 2006, pp. 1731-1735.
[21] M. V. S. Shashanka, and P. Smaragdis, "Secure Sound Classification: Gaussian Mixture Models," 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, 2006, pp. III-III.
[22] Y. Peng, C. Lin, M. Sun, and K. Tsai, "Healthcare audio event classification using Hidden Markov Models and Hierarchical Hidden Markov Models," 2009 IEEE International Conference on Multimedia and Expo, New York, NY, 2009, pp. 1218-1221.
[23] O. Gencoglu, T. Virtanen, and H. Huttunen, "Recognition of acoustic events using deep neural networks," 2014 22nd European Signal Processing Conference (EUSIPCO), Lisbon, 2014, pp. 506-510.
[24] E. Cakir, T. Heittola, H. Huttunen and T. Virtanen, "Polyphonic sound event detection using multi label deep neural networks," 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, 2015, pp. 1-7.
[25] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, F. Li, “Large-scale Video Classification with Convolutional Neural Networks,” The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1725-1732.
[26] K. J. Piczak, "Environmental sound classification with convolutional neural networks," 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Boston, MA, 2015, pp. 1-6.
[27] J. Salamon and J. P. Bello, "Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification," in IEEE Signal Processing Letters, vol. 24, no. 3, pp. 279-283, March 2017.
[28] S. Adavanne, A. Politis, J. Nikunen, and T. Virtanen, "Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks," in IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 1, pp. 34-48, March 2019.
[29] S. Adavanne, P. Pertila, and T. Virtanen, “Sound event detection using spatial features and convolutional recurrent neural network,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
[30] I.-Y. Jeong, S. Lee1, Y. Han, and K. Lee, “Audio event detection using multiple-input convolutional neural network,” in Detection and Classification of Acoustic Scenes and Events (DCASE), 2017.
[31] J. Zhou, “Sound event detection in multichannel audio LSTM network,” in Detection and Classification of Acoustic Scenes and Events (DCASE), 2017.
[32] S. Adavanne, A. Politis and T. Virtanen, "Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features," 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, 2018, pp. 1-7.
[33] C.W. Groetsch, “The theory of Tikhonov regularization for Fredholm equation of the first kind,” Pitman Advanced Pub. Program, Boston, 1984.
[34] M. Kleinsteuber and H. Shen, "Blind Source Separation With Compressively Sensed Linear Mixtures," in IEEE Signal Processing Letters, vol. 19, no. 2, pp. 107-110, Feb. 2012.
[35] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109–1121, Dec. 1984.
[36] P. Huang, S. D. Chen, P. Smaragdis and M. Hasegawa-Johnson, "Singing-voice separation from monaural recordings using robust principal component analysis," 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, 2012, pp. 57-60.
[37] B. Mijovic, M. De Vos, I. Gligorijevic, J. Taelman and S. Van Huffel, "Source Separation From Single-Channel Recordings by Combining Empirical-Mode Decomposition and Independent Component Analysis," in IEEE Transactions on Biomedical Engineering, vol. 57, no. 9, pp. 2188-2196, Sept. 2010.
[38] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, Oct. 1999.
[39] K. W. Edward Lin, Balamurali B.T., E. Koh, S. Lui, D. Herremans, “Singing Voice Separation Using a Deep Convolutional Neural Network Trained by Ideal Binary Mask and Cross Entropy," arXiv:1812.01278, 2018.
[40] M. Marius, J. Janer and E. Gómez, “Monaural Score-Informed Source Separation for Classical Music Using Convolutional Neural Networks.” ISMIR, 2017.
[41] S. Mobin, B. Cheung, and B. Olshausen, “Generalization Challenges for Neural Architectures in Audio Source Separation," arXiv:1803.08629, 2018.
[42] P. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, "Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 12, pp. 2136-2147, Dec. 2015.
[43] F. Weninger, J. R. Hershey, J. Le Roux and B. Schuller, "Discriminatively trained recurrent neural networks for single-channel speech separation," 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Atlanta, GA, 2014, pp. 577-581.
[44] F. Weninger, F. Eyben and B. Schuller, "Single-channel speech separation with memory-enhanced recurrent neural networks," 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, 2014, pp. 3709-3713.
[45] Y. Xu, J. Du, L. R. Dai, and C.-H. Lee, “A regression approach to speech enhancement based on deep neural networks,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 1, pp. 7–19, Jan. 2015.
[46] B. Wu, M. Yang, K. Li, Z. Huang, M. Siniscalchi, T. Wang, and C. H. Lee, “A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation,” EURASIP Journal on Advances in Signal Processing. 2017.
[47] F. Weninger, F. Eyben, and B. Schuller, “Single-channel speech separation with memory-enhanced recurrent neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2014, pp.3709–3713.
[48] D. Wang, “On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis,” In P. Divenyi (Ed.), “Speech Separation by Humans and Machines,” Chapter 12, pp. 181-197, Kluwer Academic, Norwell MA, 2005.
[49] Y. Wang, A. Narayanan, and D. Wang, "On Training Targets for Supervised Speech Separation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 12, pp. 1849-1858, Dec. 2014
[50] D. Wang and J. Lim, “The unimportance of phase in speech enhancement,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-30, no. 4, pp. 679–681, Aug. 1982.
[51] K. Paliwal, K. Wójcicki, and B. Shannon, “The importance of phase in speech enhancement,” Speech Commun., vol. 53, no. 4, pp. 465–494, 2011.
[52] H. Erdogan, J. R. Hershey, S. Watanabe, and J. L. Roux, “Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 708–712.
[53] A. Souloumiac, "Blind source detection and separation using second order non-stationarity," 1995 International Conference on Acoustics, Speech, and Signal Processing, Detroit, MI, USA, 1995, pp. 1912-1915 vol.3.
[54] Doukas, N. Naylor, P. Stathaki, Tania, "Voice activity detection using source separation techniques", In EUROSPEECH, 1997.
[55] Y. Li, K. C. Ho, and M. Popescu, "Efficient Source Separation Algorithms for Acoustic Fall Detection Using a Microsoft Kinect," in IEEE Transactions on Biomedical Engineering, vol. 61, no. 3, pp. 745-755, March 2014.
[56] Heittola, T. Mesaros, A. Virtanen, T. Eronen, Antti, "Sound event detection in multisource environments using source separation", In CHiME-2011, 2011.
[57] Q. Kong, Y. Xu, I. Sobieraj, W. Wang, and M. D. Plumbley, “Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data,” IEEE Trans. Acoust., Speech, Signal Process., Vol 27 Issue 4, pp. 777-787, April 2019.
[58] S. Braun and E. A. P. Habets, Dereverberation in noisy environments using reference signals and a maximum likelihood estimator, in Proc. European Signal Processing Conference, Marrakech, Morocco, 2013.
[59] A. Kuklasi´nski, S. Doclo, S. H. Jensen, and J. Jensen, Maximum likelihood based multi-channel isotropic reverberation reduction for hearing aids, in Proc. European Signal Processing Conference, Lisbon, Portugal, 2014.
[60] D. Gesbert and P. Duhamel, Robust blind channel identification and equalization based on multi-step predictors, in Proc. ICASSP’97, 1997.
[61] D. Yu, L. Deng, Deep Neural Networks. In: Automatic Speech Recognition. Signals and Communication Technology. Springer, London, 2015.
[62] Allen, J and Berkley, D. 'Image Method for efficiently simulating small‐room acoustics'. The Journal of the Acoustical Society of America, Vol 65, No.4, pp. 943‐950, 1978.
[63] A. Mohamed, G. Dahl, and G. Hinton, Deep belief networks for phone recognition, in Proc. NIPS Workshop Deep Learn. Speech Recognition Related Applicat., 2009.
[64] A. Mohamed, D. Yu, and L. Deng, “Investigation of full-sequence training of deep belief networks for speech recognition,” in Proc. Interspeech, pp. 2846–2849, 2010.
[65] L. Deng, D. Yu, and J. Platt, Scalable stacking and learning for building deep architectures, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., pp. 2133–2136, 2012.
[66] O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn and D. Yu, Convolutional Neural Networks for Speech Recognition, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 10, pp. 1533-1545, Oct. 2014.
[67] J. F. Gemmeke et al., Audio Set: An ontology and human-labeled dataset for audio events, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, pp. 776-780, 2017.
[68] S. Hershey et al., CNN architectures for large-scale audio classification, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, pp. 131-135, 2017.
[69] S. Liu and W. Deng, Very deep convolutional neural network based image classification using small training sample size, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, pp. 730-734, 20145.
[70] Zeiler M.D., Fergus R, Visualizing and Understanding Convolutional Networks. In: Fleet D., Pajdla T., Schiele B., Tuytelaars T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8689. Springer, Cham, 2014.
[71] Krizhevsky, Alex & Sutskever, Ilya & E. Hinton, Geoffrey, “ImageNet Classification with Deep Convolutional Neural Networks,” Neural Information Processing Systems. 25, 10.1145/3065386, 2012.
[72] K. Cho, B. V. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning Phrase Representations Using RNN Encoder-decoder for Statistical Machine Translation,” arXiv preprint arXiv:1406.1078, 2014.
[73] D. Wang and J. Lim, “The unimportance of phase in speech enhancement,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-30, no. 4, pp. 679–681, Aug. 1982.
[74] K. Paliwal, K. Wójcicki, and B. Shannon, “The importance of phase in speech enhancement,” Speech Commun., vol. 53, no. 4, pp. 465–494, 2011.
[75] H. Erdogan, J. R. Hershey, S. Watanabe, and J. L. Roux, “Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 708–712.
[76] J. L. Roux, N. Ono, and S. Sagayama, “Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction,” in Proc. ISCA Statist. Perceptual Audition, 2008, pp. 23–28.
[77] E. Vincent, R. Gribonval, and C. Févotte, “Performance measurement in blind audio source separation,” IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2006, 14 (4), pp.1462–1469.
[78] A. Mesaros, T. Heittola, and T. Virtanen, “Metrics for polyphonic sound event detection,” in Applied Sciences, vol. 6, no. 6, 2016.
[79] A. Mesaros, T. Heittola, and D. Ellis, “Datasets and evaluation,” in Computational Analysis of Sound Scenes and Events, T. Virtanen, M. Plumbley, and D. Ellis, Eds. Springer International Publishing, 2018, ch. 6.
[80] J. Lee and H. Kang, "A joint learning algorithm for complex-valued T-F masks in deep learning-based single-channel speech enhancement systems," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 6, pp. 1098-1108, June 2019
[81] J. Allen and D. Berkley, “Image method for efficiently simulating small‐room acoustics,” The Journal of the Acoustical Society of America, Vol 65, No.4, pp. 943‐950, 1978
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *