帳號:guest(3.17.166.87)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳宇煊
作者(外文):Chen, Yu-Hsuan.
論文名稱(中文):麥克風陣列信號對雙耳訊號之轉換
論文名稱(外文):Conversion of array signals into binaural signals
指導教授(中文):白明憲
指導教授(外文):Bai, Ming-Sian
口試委員(中文):張禎元
楊智凱
口試委員(外文):Chang, Chen-Yuan
Yang, Chih-Kai
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資通訊熱流與電聲科技產業研究所
學號:107132503
出版年(民國):109
畢業學年度:109
語文別:英文
論文頁數:75
中文關鍵詞:雙聲道立體聲多通道反向濾波器設計聲學陣列
外文關鍵詞:binaural stereomultiple inverse filter designacoustic array
相關次數:
  • 推薦推薦:0
  • 點閱點閱:413
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
本論文採用麥克風陣列信號處來取代假人頭錄音來達到雙聲道立體聲的效果。首先我們採用了波束成型的各個方法,包含了延遲相加(Delay And Sum (DAS)), 最小能量無失真響應(Minimum Power Distortionless Response, MPDR) 和 最小變異數無失真響應(Minimum Variance Distortionless Response, MVDR)來去做音訊的定位分離,此外還採用多重訊號分類(Multiple Signal Classification, MUSIC)做音訊的定位,在上述定位的比較中,多重訊號分類能同時偵測多個訊源、表現比較穩健。在分離的部分,我們額外採用提可諾夫正規化(Tikhonov Regularization, TIKR),其結果為有相較於其他方法,更好的音質與更少的干擾。在最後一部分,我們會添加HRTF渲染,來達成雙聲道立體聲的效果。
另一種方式達成雙聲道立體聲效果方法多通道輸入輸出反函數定理(Multiple-Input/Output Inverse Theorem, MINT),經由這個模型我們可以藉由設計出來的反向濾波器,進而直接達到渲染HRTF的效果。我們採用數種方法來設計濾波器,包含提可諾夫正規化與最小絕對收斂與選擇算子(least absolute shrinkage and selection operator, LASSO)。最後,由聆聽測試來證明出所有的演算法皆有明顯 HRTF效果,其中TIKR-HRTF的所有指標表現得最好。
關鍵詞:雙聲道立體聲、聲學陣列、多通道反向濾波器設計
A binaural rendering system is implemented for entertainment devices, which can emulate spatial audio effects. Traditionally, the binaural audio recording, also known as dummy head recording, is an expensive approach. This thesis proposes an array signal processing-based system to produce binaural outputs in place of costly dummy head recording. Microphone array can be applied to source separation and localization. Beamforming approaches used in this thesis are Delay And Sum (DAS), Minimum Power Distortionless Response (MPDR), and Minimum Variance Distortionless Response (MVDR). Another localization approach named the Multiple Signal Classification (MUSIC) algorithm can detect multiple source signals robustly. Source separation is achieved by the Tikhonov Regularization (TIKR) algorithm. This approach yields good sound quality and separation performance with low computation complexity. The last stage consists of convolving each separated source signal with the respective Head-Related Transfer Function (HRTF) to produce the outputs. Yet, another way of binaural rendering is Multiple-Input/Output Inverse Theorem (MINT) approach, where inverse filters need to be designed. Time-domain underdetermined multiple inverse filter (TUMIF) and frequency domain underdetermined multiple inverse filter (FUMIF) can be employed to design the inverse filters. Least-square Tikhonov Regularization (TIKR) and least absolute shrinkage and selection operator (LASSO) in sparse coding can be utilized to obtain the inverse filters. Listening tests demonstrated that the presented binaural rendering techniques delivered the binaural effects with different degrees, among which the TIKR-HRTF performed the best.

Index Terms: Binaural, Acoustic array, Multiple inverse filter design
TABLE OF CONTENTS
ABSTRACT III
致謝 IV
LIST OF FIGURE VI
LIST OF TABLES VII
CHAPTER 1 INTRODUCTION 1
CHAPTER 2 THEORY OF MICROPHONE ARRAY 4
2.1 FARFIELD ARRAY MODEL 4
2.1.1 The steering matrix design of the uniform circular array 5
2.2 SOURCE LOCATION AND SOUND SEPARATION ALGORITHM 7
2.2.1 Minimum power distortionless response (MPDR) 7
2.2.2 Minimum variance distortionless Regularization (MVDR) 9
2.2.3 Multiple signal classification (MUSIC) 9
2.2.4 Tikhonov regularization (TIKR) 11
2.3 COMPRESSED SENSING (CS) 12
2.3.1 Least absolute shrinkage and selection operator gradient descent (LASSO-GD) 12
2.4 TIME-FREQUENCY MASK (TFM) 17
2.5 HEAD RELATED TRANSFER FUNCTIONS (HRTF) 21
CHAPTER 3 MODEL MATCHING METHOD 22
3.1 FREQUENCY-DOMAIN UNDERDETERMINED MULTICHANNEL INVERSE FILTERING (FUMIF) 24
3.2 TIME-DOMAIN UNDERDETERMINED MULTICHANNEL INVERSE FILTERING (TUMIF) 26
3.3 TUMIF-LASSO 27
CHAPTER 4 RESULTS AND DISCUSSION 28
4.1 BINAURAL RENDERING BY SEPARATION AND LOCALIZATION 28
4.1.1 Localization 29
4.1.2 Separation 31
4.1.3 Discussion 33
4.2 MODEL MATCHING 35
4.3 LISTENING TEST 35
CHAPTER 5 CONCLUSION 69
REFERENCES 70

[1] Z. Wang, H. Zhang, and G. Bi, “Speech signal recovery based on source separation and noise suppression,” Journal of Computer and Communications, vol. 2, pp.112-120, 2014.
[2] M. R. Bai, J.-G. Ih ,and J. Benesty, Acoustic Array Systems: Theory, Implementation, and Application, Wiley-IEEE Press, no. 1st, Chaps. 3-4, 2013.
[3] R. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Trans. Antennas and propagation, vol. 34, pp.276-280, 1986.
[4] X. Yin and N. Germay, “A fast genetic algorithm with sharing scheme using cluster analysis methods in multi-modal function optimization”, Proc. Int. Conf. Artif. Neural Netwo. Genet. Algorith., pp. 450–457, 1993.
[5] C.W Groetsch, The theory of Tikhonov regularization for Fredholm equation of the first kind, Pitman, Boston, 1984.
[6] Mingsian R. Bai, Chang-Sheng Lai, and Po-Chen Wu, “Localization and separation of acoustic sources by using a 2.5-dimensional circular microphone array”, The Journal of the Acoustical Society of America 142, 286, 2017.
[7] E. Vincent, R. Gribonval, C. Fevotte, “Performance measurement in blind audio source separation,” IEEE Trans. Audio Speech Language Process., vol. 14, pp. 1462-1469, Jul. 2006.
[8] T. Sporer, “Wave field synthesis - generation and reproduction of natural sound environments,” in Proc. of the 7th int. conference of digital audio effects, Naples, Italy, 2004.
[9] M. Miyoshi and Y. Kaneda, "Inverse control of room acoustics using multiple loudspeakers and/Or microphones," ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, Tokyo, Japan, 1986, pp. 917-920.
[10] M. Kolundzija, C. Faller, and M. Vetterli, “ Designing practical filters for sound field reconstruction,” in Proc. 127th Conv. Audio Eng. Soc., New York, 2009.
[11] B. B. Bauer, “Stereophonic earphones and binaural loudspeakers,” J. Audio Eng. Soc., vol. 9, pp. 148-151, 1961.
[12] W. F. Druyvesteyn and J. Garas, “Personal sound,” J. Audio Eng. Soc., vol. 45, pp. 685–701, 1997.
[13] C. Kyriakakis, “Fundamental and technological limitations of immersive audio systems,” IEEE Proceedings, vol. 86, pp. 941-951, 1998.
[14] M. R. Bai, and C. C. Lee, “Objective and subjective analysis of effects of loudspeaker span on crosstalk cancellation in spatial sound reproduction,” J. Acoust. Soc. Am., Sept. 2006.
[15] M. R. Bai, C. W. Tung, and C. C. Lee, “Optimal design of loudspeaker arrays for robust cross-talk cancellation using the Taguchi method and the genetic algorithm,” J. Acoust. Soc. Am., vol. 117, pp. 2802-2813, 2005.
[16] D. H. Cooper, J. L. Bauck, “Prospects for transaural recording,” J. Audio Eng. Soc., vol. 37, pp.3-19, 1989.
[17] Ole Kirkeby, P. A. Nelson, H. Hamada, “Fast deconvolution of multichannel systems using regularization,” IEEE Trans. Speech Audio Processing, vol. 6, pp. 189-195, Mar. 1998.
[18] O. Kirkeby and P. A. Nelson, “Digital Filter Design for Inversion Problems in Sound Reproduction,” J. Audio Eng. Soc., vol. 47, no. 7/8, pp. 583-595, Aug. 1999.
[19] J. F. Claerbout, Earth Soundings Analysis: Processing versus Inversion (PVI), 1992.
[20] M. Miyoshi and Y. Kaneda, "Inverse filtering of room acoustics," in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 36, no. 2, pp. 145-152, Feb. 1988.
[21] S. G. Norcross, G. A. Soulodre, M. C. Lavoie, “Subjective investigations of inverse filtering”, J. Audio Eng. Soc., vol. 52, no. 10, pp. 1003-1028, Oct. 2004.
[22] C. Bourget, T. Aboulnasr, “Inverse filtering of room impulse response for binaural recording playback through loudspeakers”, ICASSP, vol. 3, April 1994.
[23] K. Reindl, W. Kellermann and M. Zhang, "On the limitations of binaural reproduction of monaural blind source separation output signals," 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), Bucharest, 2012, pp. 305-309.
[24] R. Nicol, “Binaural Technology”, AES Monograph, Now York, 2010.
[25] B. Gardner, K. Martin, HRTF measurements of KEMAR dummy-head microphone, 1994.
[26] M. R. Schroeder and B. S. Atal, ‘‘Computer simulation of sound transmission in rooms’’, IEEE Conv. Rec., pp. 150-155, 1963.
[27] D. H. Cooper, “Calculator program for head-related transfer functions”, J. Audio Eng. Soc., vol. 30, pp. 34-38, 1982.
[28] W. G. Gardner, “Transaural 3D audio,” MIT Media Laboratory Tech. Report, 342 , 1995.
[29] J. L. Bauck and D. H. Cooper, “Generalized transaural stereo and applications”, J. Audio Eng. Soc., vol. 44, pp. 683-705, 1996.
[30] Sebastian Nagel, Peter Jax, “DYNAMIC BINAURAL CUE ADAPTATION”, IWAENC2018. Proceeding, p 96-100, Sept. 2018.
[31] G. J. Brown, S. Harding and J. P. Barker, "Speech Separation Based on The Statistics of Binaural Auditory Features," 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, 2006, pp. V-V.
[32] N. Roman, D. Wang, G. J. Brown, Speech segregation based on sound localization. J Acoust Soc Am. 114(4), 2236–2252, 2003.
[33] Roman N, Srinivasan S, Wang D (2006) Binaural segregation in multisource reverberant environments. J Acoust Soc Am 120(6):4040–4051.
[34] S. Rangachari, P. C. Loizou and Yi Hu, "A noise estimation algorithm with rapid adaptation for highly nonstationary environments," 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Que., 2004.
[35] Jingdong Chen, J. Benesty, Yiteng Huang and S. Doclo, "New insights into the noise reduction Wiener filter," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1218-1234, July 2006.
[36] S. Boyd and L. Vandenberghe, Convex optimization, Cambridge University Press, New York, Chap. 1-7, 2004.
[37] M. R. Bai and C. C. Chen, “Application of Convex Optimization to Acoustical Array Signal Processing,” J. Sound Vibration, 332(5), 6596-6616, 2013.
[38] E. Candes and M. Wakin, “An introduction to compressive sampling,” IEEE Signal Processing Mag., vol. 25, no. 2, pp. 21–30, 2008.
[39] G. F. Edelmann and C. F. Gaumond, “Beamforming using compressive sensing,” J. Acoust. Soc. Am. 130(4), 232–237, 2011.
[40] F. Chen, L. Shen, B. W. Suter, and Y. Xu, “A Fast and Accurate Algorithm for ℓ1 Minimization Problems in Compressive Sampling,” EURASIP Journal on Advances in Signal Processing, vol. 1, pp. 1-12, 2015.
[41] P. Gerstoft, C. F. Mecklenbr€auker, W. Seong, and M. Bianco, “Introduction to compressive sensing in acoustics,” J. Acoust. Soc. Am. 143(6), 3731–3736, 2018.
[42] Mingsian R. Bai, Hung-Yu Chen, Lihao Yang, and Shin-Cheng Huang, “Active control of noise in a duct using the sparsely coded time-domain underdetermined multichannel inverse filters,” The Journal of the Acoustical Society of America 146, 1371, 2019.
[43] Stephen Boyd, Neal Parikh, Proximal Algorithms, Foundations and Trends in Optimization, Vol. 1, No. 3, Chapter 4, 2013.
[44] A. W. Rix, J. G. Beerends, M. P. Hollier and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings, Salt Lake City, UT, USA, 2001, pp. 749-752 vol.2.
(此全文20250910後開放外部瀏覽)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *