帳號:guest(3.145.86.183)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):賴昶勝
作者(外文):Lai, Chang-Sheng
論文名稱(中文):2.5維麥克風陣列之聲場定位和分離
論文名稱(外文):Localization and Separation of Acoustic Sources by Using a 2.5-Dimensional Circular Microphone Array
指導教授(中文):白明憲
指導教授(外文):Bai, Ming-Sian
口試委員(中文):劉奕汶
陳榮順
口試委員(外文):Liu, Yi-Wen
Chen, Rong-Shun
學位類別:碩士
校院名稱:國立清華大學
系所名稱:動力機械工程學系
學號:103033616
出版年(民國):105
畢業學年度:105
語文別:英文中文
論文頁數:67
中文關鍵詞:2.5維圓形麥克風陣列延遲總和演算法提可諾夫正規化感知壓縮
外文關鍵詞:2.5-DCircular Microphone ArrayInternal IterationTikhonov RegularizationLogarithmic-Spacing Linear ArrayDelay and sum
相關次數:
  • 推薦推薦:0
  • 點閱點閱:104
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
圓形麥克風陣列(CMA)相較於球型麥克風陣列(SMA)而言,運算量較少,由於生場定位的問題水平角比仰角來的重要的許多,以一般的應用與情境來說應用的較為廣泛。但是真正的圓形麥克風陣列並不能進行仰角的定位,對仰角不敏感,但是也可以做為空間中三維聲場的限制。從波束成型可以發現在仰角高於60度以及俯角低於60度,CMA的定位效果不是很明顯。本篇論文將提出2.5維麥克風陣列(2.5-D CMA) 將一個對數間距的線性陣列垂直放置在一個未障板圓形麥克風陣列的中心點上。在定位的部分,將會使用延遲總和演算法(DAS)分別應用在對數間距的線性陣列與未障板圓形麥克風陣列來找出聲源的方位。至於在聲源分離的部分將使用提可諾夫正規化(TIKR)與感知壓縮(CS)來進行聲源分離。經由提可諾夫正規化與感知壓縮演算法分離出來的語音訊號將透過正規化最小均方演算法(NLMS)與內部迭代法(IIT)來提升語音訊號的品質。至於硬體部分的架設,未障板圓形麥克風陣列將會使用三維印刷技術製造,上面將有24個微機電系統麥克風均勻散佈在圓環上,而對數間距的線性陣列則是使用壓克力製造而成,將會有8個微機電系統麥克風分布在上面。分離之後的音訊將會使用主觀與客觀的標準來進行判斷。客觀判斷將使用語音質量感性評估(PESQ)來當作評斷準則,主觀的聆聽測試也會用來當作評估的標準。
Circular microphone arrays (CMA) are preferred over more complex spherical microphone arrays (SMA) in the context of some audio applications because azimuthal angles of spatial sound are considered more important than the elevation angles in those scenarios. However, the fact that CMA does not resolve the elevation angle well can be a limitation for some applications which involves 3-dimensional sound fields. But this can also be a limitation in spatial audio rendering. Sources with elevation less than 60 degrees can be localized precisely respectively. This paper proposes a 2.5-dimensional (2.5-D) CMA that consists of an unbaffled CMA and a vertical logarithmic-spacing linear array on the top. In the localization stage, two delay-and-sum (DAS) beamformers are applied to the circular array and the linear array, respectively. The product of the identified angular patterns yields the direction of arrival (DOA). In the separation stage, Tikhonov Regularization (TIKR) and Compressive Sensing (CS) are employed to extract the source signal amplitudes from the output signals from two arrays. The extracted signals are further processed by Normalized Least-Mean-Square (NLMS) algorithm with Internal Iteration (IIT) Algorithm respectively in order to produce the source signal with improved quality. To validate the 2.5-D CMA experimentally, a three-dimensionally printed circular array comprised of a 24 micro-electro-mechanical-system (MEMS) microphone circular array and an 8- MEMS microphone logarithmic-spacing linear array is constructed for localization and separation for sound sources. Objective Perceptual Evaluation of Speech Quality (PESQ) test and a subjective listening test are undertaken in performance evaluation. The experimental results demonstrate better separation quality achieved by the CS combined with NLMS method than by the TIKR combined with NLMS method.
摘 要 4
ABSTRACT 6
誌 謝 8
Chapter 1 11
INTRODUCTION 11
Chapter 2 14
FORMULATIONS OF 14
CIRCULAR AND LINEAR 14
ARRAY 14
2.1 Circular Microphone Array 15
2.2 Logarithmic-Spacing Linear Array 17
Chapter 3 22
LOCALIZATION OF ACOUSTIC SOURCES 22
3.1 Delay-and-Sum Beamformer 23
3.2 Localization Using the 2.5-D Array 23
3.3 Localization Using the Full 3-D Array 25
Chapter 4 29
SEPARATION OF SOURCE SIGNALS 29
4.1 Array Signal Extraction 30
4.2 Output Correlator for the 2.5-D Array 33
Chapter 5 36
NUMERICAL SIMULATION AND EXPERIMENTAL VALIDATION 36
5.1 Two-Sources Example 37
5.2 Three-Sources Example 40
Chapter 6 57
CONCLUSION 57
APPENDIX 64


[1]Y. H. Kim and J. W. Choi, Sound Visualization and Manipulation, Wiley, Singapore, 2013.
[2]E. T. Roig and F. Jacobsen, “Deconvolution for the localization of sound sources using a circular microphone array,” J. Acoust. Soc. Am., vol. 134, no. 3, pp. 2078-2089, 2013.
[3]K. Nakadi, H. Nakajima, G. Ince, and Y. Haseawa, “Sound source separation and automatic speech recognition for moving sources,” IEEE/RSJ International Conference on. Intelligent Robots and Systems (IROS), pp. 976-981. Taiwan, 2010.
[4]K. Kokkinakis and P. C. Loizou, “Using blind source separation techniques to improve speech recognition in bilateral cochlear implant patients,” J. Acoust. Soc. Am., vol. 123, no. 04, pp. 2379-2390, 2008.
[5]S. F. Wu and N. Zhu, “Blind extraction and localization of sound sources using point sources based approaches,” J. Acoust. Soc. Am., vol. 132, no. 2, pp. 904-917, 2012.
[6]W. J. Zeng, X. Jiang, and H. C. So, “Sparse-representation algorithms for blind estimation of acoustic-multipath channels,” J. Acoust. Soc. Am., vol. 133, no. 4, pp. 2191-2197, 2013.
[7]H. Li and O. Chutatape, “Automatic detection and boundary estimation of the optic disk in retinal images using a model-based approach,” J. Electron. Imag., vol. 12, no. 1, pp. 97–105, 2003.
[8]J. F. Cardoso, “Source separation using higher order moments,” in Proc. IEEE ICASSP, Albuquerque, NM, 1990, pp. 2109–2112.
[9]B. Ma, S. Lakshmanan, and A. O. Hero, “Simultaneous detection of lane and pavement boundaries using model-based multisensor fusion,” IEEE Trans. Intell. Transp. Syst., vol. 1, no. 5, pp. 135–147, Sep. 2000.
[10]B. Rafaely, Fundamentals of Spherical Array Processing. Springer, 2015, vol. 8.
[11]E. Tuncer and B. Friedlander, Classical and Modern Direction-of-Arrival Estimation. New York: Academic, 2009.
[12]M. R. Bai and C.H. Kuo, “Deconvolution-based acoustic source localization and separation algorithms”, J. Comp. Acoust., vol. 135, no. 02, pp. 2358-2381, 2014.
[13]M. R. Bai, J.-G. Ih, J. Benesty, Acoustic Array Systems: Theory, Implementation, and Application, Wiley-IEEE Press, 2013, 1st edition
[14]M. R. Bai, Y. H. Yao, C. S. Lai and Y. Y. Lo, “Modal domain and space domain formulations of spherical microphone arrays with application to source localization and separation”, J. Acoust. Soc. Am., vol. 139, no. 3, pp. 1058-1070, 2016.
[15]B. Rafaely, B. Weiss, and E. Bachmat, “Spatial aliasing in spherical microphone arrays,” IEEE Trans. Signal Process, vol. 55, no. 3, pp. 1003–1010, Mar. 2007.
[16]B. Rafaely, Fundamentals of Spherical Array Processing. Springer, 2015, vol. 8.
[17]M. Poletti, “Three-dimensional surround sound systems based on spherical harmonics,” J. Audio Eng. Soc., vol. 53, no. 11, pp. 1004–1025, Nov. 2005.
[18]Z. Li, R. Duraiswami, and L. Davis, “Recording and reproducing high order surround auditory scenes for mixed and augmented reality,” in Mixed and Augmented Reality, 2004. ISMAR 2004. Third IEEE and ACMInternational Symposium on, Nov 2004, pp. 240–249.
[19]A. Parthy, N. Epain, A. van Schaik, and C. Jin, “Comparison of the measured and theoretical performance of a broadband circular microphone array,” J. Acoust. Soc. Amer., vol. 130, no. 6, pp. 3827–3837, Dec. 2011.
[20]A. N. Tikhonov, “Solution of nonlinear integral equations of the first kind,” Soviet Math. Dokl., vol. 5, pp. 835-838, 1964.
[21]E. Candes and M. Wakin, “An introduction to compressive sampling,” IEEE Signal Processing Mag., vol. 25, no. 2, pp. 21–30, Mar. 2008.
[22]E. Candès, J. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate information,” Commun. Pure Appl. Math., vol.59, no. 8, pp. 1207–1233, 2005.
[23]E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory, no. 2, pp. 489–509, Feb. 2006.
[24]B. Widrow and D. Steams, Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1985.
[25]ITU-T Recommendation, Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech coders, ITU-T Recommendation P.862, February 2001.
[26]ITU-T Rec. P.862.2, “Wideband Extension to Recommendation P.862 for the Assessment of Wideband Telephone Networks and Speech Codecs”, Int. Telecom. Union, Geneva, 2005.
[27]A.J. Klockars, G.R. Hancock, and M.J. McAweeney. Power of unweighted and weighted versions of simultaneous and sequential multiple-comparison procedures. Psychological Bulletin, 1995, 118, 300-307.
[28]J. Neter, M.H. Kutner, C.J. Nachtsheim, and W. Wasserman, Applied Linear Regression Models, third ed. Chicago: Irwin, 1996.
[29]A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Processing, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 1999.
[30]H. Teutsch and W. Kellerman, “Acoustic source detection and localization based on wavefield decomposition using circular microphone arrays,” J. Acoust. Soc. Amer., vol. 120, no. 5, pp. 2724–2736, 2006.
[31]M. Grant, S. Boyd, and Y. Ye, cvx: Matlab Software for Disciplined Convex Programming. [Online]. Available: http://www.stanford.edu/~boyd/cvx/
[32]D. Van Compernolle, “Switching adaptive filters for enhancing noisy and reverberant speech from microphone array recordings,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 2, Albuquerque, NM, Apr. 1990, pp. 833–836.
[33]“Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs,” ITU, ITU-T Rec. P. 862, 2000.
[34]ITU Recommendation, Method for the subjective assessment of intermediate quality level of coding systems, ITU-R BS.1534-1, 2003.
(此全文未開放授權)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *