作者(外文):Li, Huang-Yi
論文名稱(外文):Solving the Permutation Problem in Frequency Domain Source Separation Based on the Correlation of Envelopes between Frequencies
指導教授(外文):Liu, Yi-Wen
口試委員(外文):Bai, Ming-Sian
Hong, Yao-Win
Li, Pei-Chun
外文關鍵詞:Sound source separationIndependent component analysisPermutation problem
在現實的環境中,各個聲源是以摺積混合(convolutive mixture)的方式到達麥克風,若欲在時域上處理聲源分離之問題是相當困難且耗時的。前人嘗試過的方法是將混合訊號利用短時間傅立葉轉換(short-time Fourier transform, STFT)轉換到時頻域上,再以獨立成份分析(Independent Component Analysis, ICA)在每個頻率柱(frequency bin)上分離聲源。但代價是膨脹問題(scaling problem)及排列問題(permutation problem)。解決膨脹問題的方法相當簡單,而解決排列問題較困難也是本論文探討的重點。對於同個語音訊號而言,其鄰近的頻率間之能量包絡有較大的相關性,本論文以此為出發點,建立一套演算法來解決排列問題。解決上述兩個問題後,再將分離的訊號利用反短時間傅立葉轉換(inverse short-time Fourier transform, ISTFT)轉回到時域,即完成聲源分離。在實驗一~實驗四中,本論文以實地錄音取得混合音檔,並利用以上流程分離混合訊號,取得分離的聲源訊號。以主觀及客觀的評估方式,評估演算法的分離效果。結果顯示,對於聲源皆為語音訊號的1-4組,受測者主觀評分之平均值皆高於4.18分(如表5-3)。而在實驗五中,以本論文提出的方法與文獻[23]、[25]提出之方法比較,聲源干擾率(source-to-interferences ratio, SIR)提升3.1dB,以上結果皆指出本論文提出的方法可有效地解決排列問題,提升分離效果。
In a real environment, sound sources are coupled to the microphones by convolution with room responses. It is difficult and time-consuming to deal with source separation in the time domain. Existing approaches deal with source separation by converting the mixed signals to the time-frequency domain by short-time Fourier transform (STFT). Then, Independent Component Analysis (ICA) is applied in each frequency bin to separate the sources, however, the drawbacks for this particular method were the scaling problem and the permutation problem. Among these two problems, the permutation problem is much more difficult to resolve and it is also the focus of this thesis. Based on the assumption that the correlations should be high between the temporal envelopes of neighboring frequencies from the same sound source, we have developed an algorithm to solve the permutation problem. After solving the scaling problem and the permutation problem, the separated signals are converted to the time domain by inverse short-time Fourier transform (ISTFT) to complete the separation. In experiment 1 to 4, the sound sources were obtained by recording in the room, and by using the steps above to acquire the separated signals. The effectiveness of the algorithm were assessed by subjective and objective measures. From the results of experiment 1 to 4, the sound sources which is labeled as 1-4 are rated by the participants with an average score higher than 4.18 out of 5. In experiment 5, we compared the method from our thesis to the methods from [23] and [25], and the present method improves the source to interferences ratio (SIR) by 3.1 dB. The results of experiments have shown that the method of our thesis was able to effectively solve the permutation problem, and improve the separation performance.
