帳號:guest(3.15.34.39)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):李皇儀
作者(外文):Li, Huang-Yi
論文名稱(中文):基於頻率間包絡之相關性解決頻域上聲源分離之排列問題
論文名稱(外文):Solving the Permutation Problem in Frequency Domain Source Separation Based on the Correlation of Envelopes between Frequencies
指導教授(中文):劉奕汶
指導教授(外文):Liu, Yi-Wen
口試委員(中文):白明憲
洪樂文
李沛群
口試委員(外文):Bai, Ming-Sian
Hong, Yao-Win
Li, Pei-Chun
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:102061610
出版年(民國):104
畢業學年度:103
語文別:中文
論文頁數:68
中文關鍵詞:聲源分離獨立成份分析排列問題
外文關鍵詞:Sound source separationIndependent component analysisPermutation problem
相關次數:
  • 推薦推薦:0
  • 點閱點閱:489
  • 評分評分:*****
  • 下載下載:6
  • 收藏收藏:0
在現實的環境中,各個聲源是以摺積混合(convolutive mixture)的方式到達麥克風,若欲在時域上處理聲源分離之問題是相當困難且耗時的。前人嘗試過的方法是將混合訊號利用短時間傅立葉轉換(short-time Fourier transform, STFT)轉換到時頻域上,再以獨立成份分析(Independent Component Analysis, ICA)在每個頻率柱(frequency bin)上分離聲源。但代價是膨脹問題(scaling problem)及排列問題(permutation problem)。解決膨脹問題的方法相當簡單,而解決排列問題較困難也是本論文探討的重點。對於同個語音訊號而言,其鄰近的頻率間之能量包絡有較大的相關性,本論文以此為出發點,建立一套演算法來解決排列問題。解決上述兩個問題後,再將分離的訊號利用反短時間傅立葉轉換(inverse short-time Fourier transform, ISTFT)轉回到時域,即完成聲源分離。在實驗一~實驗四中,本論文以實地錄音取得混合音檔,並利用以上流程分離混合訊號,取得分離的聲源訊號。以主觀及客觀的評估方式,評估演算法的分離效果。結果顯示,對於聲源皆為語音訊號的1-4組,受測者主觀評分之平均值皆高於4.18分(如表5-3)。而在實驗五中,以本論文提出的方法與文獻[23]、[25]提出之方法比較,聲源干擾率(source-to-interferences ratio, SIR)提升3.1dB,以上結果皆指出本論文提出的方法可有效地解決排列問題,提升分離效果。
In a real environment, sound sources are coupled to the microphones by convolution with room responses. It is difficult and time-consuming to deal with source separation in the time domain. Existing approaches deal with source separation by converting the mixed signals to the time-frequency domain by short-time Fourier transform (STFT). Then, Independent Component Analysis (ICA) is applied in each frequency bin to separate the sources, however, the drawbacks for this particular method were the scaling problem and the permutation problem. Among these two problems, the permutation problem is much more difficult to resolve and it is also the focus of this thesis. Based on the assumption that the correlations should be high between the temporal envelopes of neighboring frequencies from the same sound source, we have developed an algorithm to solve the permutation problem. After solving the scaling problem and the permutation problem, the separated signals are converted to the time domain by inverse short-time Fourier transform (ISTFT) to complete the separation. In experiment 1 to 4, the sound sources were obtained by recording in the room, and by using the steps above to acquire the separated signals. The effectiveness of the algorithm were assessed by subjective and objective measures. From the results of experiment 1 to 4, the sound sources which is labeled as 1-4 are rated by the participants with an average score higher than 4.18 out of 5. In experiment 5, we compared the method from our thesis to the methods from [23] and [25], and the present method improves the source to interferences ratio (SIR) by 3.1 dB. The results of experiments have shown that the method of our thesis was able to effectively solve the permutation problem, and improve the separation performance.
中文摘要 i
Abstract ii
誌謝 iii
目錄 I
圖目錄 III
表目錄 V
第一章 緒論 1
1.1研究動機與目的 1
1.2文獻回顧 4
1.3研究方法及目標 6
1.4章節介紹 6
第二章 獨立成份分析 7
2.1 獨立成份分析之動機及假設 7
2.2 獨立成份分析之不確定性問題 10
2.3 獨立成份分析之預處理 11
2.4 獨立成份分析之最佳化演算法 13
第三章 聲源分離系統之問題簡介 19
3.1 聲源分離系統之類型 19
3.2 聲源分離之摺積混合系統 19
3.3 時域上處理聲源分離問題 20
3.4 頻域上處理聲源分離問題 21
第四章 膨脹問題、排列問題及環形摺積問題之解決方法 24
4.1 膨脹問題之解決方法 24
4.2 排列問題之解決方法 25
4.3 環形摺積問題之解決方法 36
4.4 系統流程圖 37
第五章 實驗方法及結果 38
5.1 實驗環境及器材 38
5.2 實驗評估方式 39
5.3 實驗結果 40
第六章 結論與未來展望 49
參考文獻 51
附錄 58
A.1 訊號波形 58
A.2 複數獨立成份分析之收斂式推導 68
[1] M. G. Lopez P., H. Molina Lozano, L. P. Sanchez F., and L. N. Oliva Moreno, “Blind Source Separation of audio signals using independent component analysis and wavelets,” in CONIELECOMP 2011, 21st International Conference on Electrical Communications and Computers, 2011, pp. 152–157.
[2] Longji Sun and Qi Cheng, “Real-time microphone array processing for sound source separation and localization,” in 2013 47th Annual Conference on Information Sciences and Systems (CISS), 2013, pp. 1–6.
[3] J. Nikunen and T. Virtanen, “Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation,” IEEE/ACM Trans. Audio, Speech, and Language Processing, vol. 22, no. 3, pp. 727–739, Mar. 2014.
[4] Y. Yang, Z. Li, X. Wang, and D. Zhang, “Noise source separation based on the blind source separation,” in 2011 Chinese Control and Decision Conference (CCDC), 2011, pp. 2236–2240.
[5] L. Wang, T. Gerkmann, and S. Doclo, “Noise PSD Estiamtion using Blind Source Separation in a Diffuse Noise Field,” 13th Internationa Workshop on Acoustic Signal Enhancement, pp. 1–4, 2012.
[6] A. Hajisami, H. Viswanathan, and D. Pompili, “‘Cocktail Party in the Cloud’: Blind Source Separation for Co-Operative Cellular Communication in Cloud RAN,” in 2014 IEEE 11th International Conference on Mobile Ad Hoc and Sensor Systems, 2014, pp. 37–45.
[7] Y. Guo, G. R. Naik, and H. Nguyen, “Single channel blind source separation based local mean decomposition for biomedical applications,” Engineering in Medicine and Biology Society (EMBC), 35th Annual International Conference of the IEEE, pp. 6812–5, Jan. 2013.
[8] C. Lin and E. Hasting, “Blind source separation of heart and lung sounds based on nonnegative matrix factorization,” in 2013 International Symposium on Intelligent Signal Processing and Communication Systems, 2013, pp. 731–736.
[9] M. Y. Abbass, S. A. Shehata, S. S. Haggag, S. M. Diab, B. M. Salam, S. El-Rabaie, and F. E. Abd El-Samie, “Blind separation of noisy images using finite Ridgelet Transform and wavelet de-noising,” in 2013 Second International Japan-Egypt Conference on Electronics, Communications and Computers (JEC-ECC), 2013, pp. 176–181.
[10] J.-F. Cardoso, “Blind signal separation: statistical principles,” Proc. IEEE, vol. 86, no. 10, pp. 2009–2025, 1998.
[11] M. Zibulevsky and B. A. Pearlmutter, “Blind Source Separation by Sparse Decomposition in a Signal Dictionary,” Neural Computation, vol. 13, no. 4, pp. 863–882, Apr. 2001.
[12] A. S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound, Cambridge, MA: MIT Press, 1990.
[13] J.-F. Cardoso, “Source separation using higher order moments,” in International Conference on Acoustics, Speech, and Signal Processing, 1989, pp. 2109–2112.
[14] A. Mansour, M. Kawamoto, and N. Ohnishi, “Blind separation for instantaneous mixture of speech signals: algorithms and performances,” in 2000 TENCON Proceedings. Intelligent Systems and Technologies for the New Millennium (Cat. No.00CH37119), 2000, vol. 1, pp. 26–32.
[15] D.-T. Pham, “Blind separation of instantaneous mixture of sources based on order statistics,” IEEE Trans. Signal Processing, vol. 48, no. 2, pp. 363–375, 2000.
[16] M. Z. Ikram and D. R. Morgan, “A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation,” in IEEE International Conference on Acoustics Speech and Signal Processing, 2002, vol. 1, pp. I–881–I–884.
[17] S. Kurita, H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, “Evaluation of blind signal separation method using directivity pattern under reverberant conditions,” in 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 2000, vol. 5, pp. 3140–3143.
[18] K. Toyama and M. D. Plumbley, “Using phase linearity in frequency-domain ICA to tackle the permutation problem,” in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 3165–3168.
[19] K. Matsuoka, “Minimal distortion principle for blind source separation,” in Proceedings of the 41st SICE Annual Conference. SICE 2002., 2002, vol. 4, pp. 2138–2143.
[20] F. Nesta, T. S. Wada, and B.-H. Juang, “Coherent spectral estimation for a robust solution of the permutation problem,” in 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009, pp. 105–108.
[21] D. Nion, K. N. Mokios, N. D. Sidiropoulos, and A. Potamianos, “Batch and Adaptive PARAFAC-Based Blind Separation of Convolutive Speech Mixtures,” IEEE Trans. Audio, Speech, and Language Processing, vol. 18, no. 6, pp. 1193–1207, Aug. 2010.
[22] H. Sawada, S. Araki, and S. Makino, “Measuring Dependence of Bin-wise Separated Signals for Permutation Alignment in Frequency-domain BSS,” in 2007 IEEE International Symposium on Circuits and Systems, 2007, pp. 3247–3250.
[23] H. Sawada, R. Mukai, S. Araki, and S. Makino, “A Robust and Precise Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation,” IEEE Trans. Speech and Audio Processing, vol. 12, no. 5, pp. 530–538, Sep. 2004.
[24] Wanlong Li, Ju Liu, Jun Du, and Shuzhong Bai, “Solving permutation problem in frequency-domain blind source separation using microphone sub-arrays,” in 2008 International Conference on Neural Networks and Signal Processing, 2008, pp. 67–72.
[25] R. Mazur, J. O. Jungmann, and A. Mertins, “A new clustering approach for solving the permutation problem in convolutive blind source separation,” in 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013, pp. 1–4.
[26] L. Parra and C. Spence, “Convolutive blind separation of non-stationary sources,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 3, pp. 320–327, May 2000.
[27] D. Pham, C. Serviere, and H. Boumaraf, “Blind separation of convolutive audio mixtures using nonstationarity,” Proc. ICA, 2003.
[28] C. Jutten and J. Herault, “Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture,” Signal processing 24.1 (1991): 1-10.
[29] P. Comon, “Independent component analysis, a new concept?,” Signal processing 36.3 (1994): 287-314.
[30] A. J. Bell and T. J. Sejnowski, “An Information-Maximization Approach to Blind Separation and Blind Deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129–1159, Nov. 1995.
[31] A. Hyvärinen, “Fast and robust fixed-point algorithms for independent component analysis.,” IEEE Trans. Neural Networks, vol. 10, no. 3, pp. 626–34, Jan. 1999.
[32] A. Hyvärinen and E. Oja, “Independent component analysis: algorithms and applications,” Neural networks, 13.4(2000): 411-430.
[33] T. Cover and J. Thomas, Elements of information theory. John Wiley & Sons, Inc., 2012.
[34] M. Jones and R. Sibson, “What is projection pursuit?,” Journal of the Royal Statistical Society. Series A (General) (1987): 1-37.
[35] A. Hyvrinen, “New approximations of differential entropy for independent component analysis and projection pursuit,” Advances in Neural Information Processing Systems 10 (1998): 273-279.
[36] L. Tong, V. C. Soon, Y. F. Huang, and R. Liu, “AMUSE: a new blind identification algorithm,” in IEEE International Symposium on Circuits and Systems, 1990, pp. 1784–1787.
[37] H. Shen and K. Huper, “Newton-Like Methods for Parallel Independent Component Analysis,” in 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, 2006, pp. 283–288.
[38] S. Choi, S. Amari, A. Cichocki, and R. Liu, “Natural gradient learning with a nonholonomic constraint for blind deconvolution of multiple channels,” First International Workshop on Independent Component Analysis and Signal Separation. 1999.
[39] A. J. Bell and T. J. Sejnowski, “An Information-Maximization Approach to Blind Separation and Blind Deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129–1159, Nov. 1995.
[40] D. Luenberger, Optimization by vector space methods, John Wiley & Sons, Inc., 1969.
[41] W. Zhang, J. Liu, J. Sun, and S. Bai, “A New Two-Stage Approach to Underdetermined Blind Source Separation using Sparse Representation,” in 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP ’07, 2007, vol. 3, pp. III–953–III–956.
[42] V. G. Reju, “Underdetermined Convolutive Blind Source Separation via Time–Frequency Masking,” IEEE Trans. Audio, Speech, and Language Processing, vol. 18, no. 1, pp. 101–116, Jan. 2010.
[43] M. Joho, H. Mathis, and R. Lambert, “Overdetermined blind source separation: Using more sensors than source signals in a noisy mixture,” Proc. ICA. 2000.
[44] Y. Xue and Y. Wang, “A novel method for overdetermined blind source separation,” in The 2nd International Conference on Information Science and Engineering, 2010, pp. 1751–1754.
[45] R. Aichner, S. Araki, S. Makino, T. Nishikawa, and H. Saruwatari, “Time domain blind source separation of non-stationary convolved signals by utilizing geometric beamforming,” in Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, 2002, pp. 445–454.
[46] A. Oppenheim, R. Schafer, and J. Buck, Discrete-time signal processing, Vol. 2. Englewood Cliffs: Prentice-hall, 1989.
[47] E. Bingham and A. Hyvärinen, “A fast fixed-point algorithm for independent component analysis of complex valued signals,” International journal of neural systems 10.01 (2000): 1-8.
[48] H. Sawada, R. Mukai, and S. Kethulle, “Spectral smoothing for frequency-domain blind source separation,” in Proc. IWAENC 2003.
[49] S. Araki, R. Mukai, S. Makino, T. Nishikawa, and H. Saruwatari, “The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech,” IEEE Trans. Speech and Audio Processing, vol. 11, no. 2, pp. 109–116, Mar. 2003.
[50] E. Vincent, R. Gribonval, and C. Fevotte, “Performance measurement in blind audio source separation,” IEEE Trans. Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1462–1469, Jul. 2006.
[51] http://www.kecl.ntt.co.jp/icl/signal/sawada/demo/bss2to4/index.html.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *