結合物理模型與卷積神經網路之小號聲音合成__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.18) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	廖學煒
作者(外文):	Liao, Hsueh-Wei
論文名稱(中文):	結合物理模型與卷積神經網路之小號聲音合成
論文名稱(外文):	Trumpet Sound Synthesis by Collaboration between a Physical model and a Convolutional Neural Network
指導教授(中文):	劉奕汶
指導教授(外文):	Liu, Yi-Wen
口試委員(中文):	蘇文鈺蘇黎
口試委員(外文):	Su, Wen-Yu Su, Li
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	電機工程學系
學號:	106061536
出版年(民國):	108
畢業學年度:	108
語文別:	英文
論文頁數:	41
中文關鍵詞:	聲音合成、聲學物理、物理模型模擬、類神經網路、補償系統、深度學習
外文關鍵詞:	Sound Synthesis、Acoustical Physics、Physical Modeling Simulation、Neural Network、Compensating System、Deep Learning
相關次數:	推薦:0 點閱:272 評分: 下載:12 收藏:0

本論文提出了基於複合型模型之小號聲音合成方法。首先，本論文的基礎為數位導波模型，而為了使其合成結果趨向真實，提出額外得補償系統以增進合成品質。特別的地方在於，此系統並非對某個已知的物理系統求解，而是根據優化學習型模型而建立的。在物理模型的幫助下，預期只需要少量之資料即可成功訓練系統。在實驗中，為了先檢測了物理模型之模擬是否合理，可以透過模擬之聲波阻抗，觀察其是否符合真實小號之聲音特性。對於補償系統之實驗，本論文使用了兩段來自Good-Sounds dataset 的音訊作為訓練資料。其結果顯示：可合成頻率範圍比訓練資料廣泛之小號聲音，所建立之複合型模型可合成頻率範圍比訓練資料根為廣泛之小號聲音，換言之，此系統具有推廣能力。同時，其所產生之音檔也可以直接聆聽比較。

In this thesis, a hybrid model is proposed for synthesizing the sound of trumpet. A simple physical model developed by digital waveguides (DWG) modeling is adopted as a baseline system. To achieve realistic trumpet sound synthesis, a compensating system is introduced for improving the simulation quality. An unique characteristic of the system is that, in contrast to solving a mathematical expressions for a physical system, this compensating system is constructed by optimizing a learning based
model. With the help of the physical model, it is expected that only a small amount of data is needed for training the unknown system. In experiments, to validate the used physical model, the simulated acoustic impedance is compared with the measurement of the trumpet. For the compensating system, two audio segment in
Good-Sounds datatset were used. The results show that a trumpet sound can be successfully synthesized with a frequency range which is wider than that of the training data. For this sense, the system has demonstrated an ability to generalize from limited training data. The synthesis audio can be also compared by direct listening.

摘要 i
Abstract ii
1 Introduction
1.1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Methods
2.1 Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 TravelingWave
Solution . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Digital waveguide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Bidirectional delay line . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Ideal acoustic tube . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.3 Viscothermal loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.4 DWG Juction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.5 The Limitation of DWG Model . . . . . . . . . . . . . . . . . . . . . 10
2.3 Lip Buzzing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Lip’s motion mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 Airflow Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.3 Procedure of simulation . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Full Bore Simulation with Lip Excitation . . . . . . . . . . . . . . . . . . . . 14
2.5 A Compensating System based on Neural Network . . . . . . . . . . . . . . . 15
2.5.1 An Introduction to Neural Network . . . . . . . . . . . . . . . . . . . 15
2.5.2 Fully Connected Layer . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.3 Convolution Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.4 Loss and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.5 Proposed Compensating System . . . . . . . . . . . . . . . . . . . . . 18
3 Experiments and Discussions
3.1 Bore Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Estimation of Acoustic Impedance . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.1 Lead Pipe and Main Bore . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Mouthpiece . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.3 Bell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.4 Full bore simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Lip Excitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.1 Full Bore Simulation with Lip Excitation . . . . . . . . . . . . . . . . 28
3.4 Analysis and Synthesis from Real Recordings . . . . . . . . . . . . . . . . . . 29
3.5 Training the Compensating System . . . . . . . . . . . . . . . . . . . . . . . . 30
iii
3.5.1 Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.2 Training Setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.3 A Mismatch between Simulation and Real Data . . . . . . . . . . . . . 31
3.5.4 Training by Different Loss Function . . . . . . . . . . . . . . . . . . . 32
3.5.5 Generated Samples with Other Pitches . . . . . . . . . . . . . . . . . . 34
4 Conclusions and Future Works
4.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
References 36
Appendix 41

[1] G. Bandiera, O. Romani Picas, H. Tokuda, W. Hariya, K. Oishi, and X. Serra, “Good sounds. org: A framework to explore goodness in instrumental sounds,” in International Society for Music Information Retrieval Conference, pp. 414–423, 2016.
[2] M. Van Walstijn, “Wavebased simulation of wind instrument resonators,” IEEE Signal Processing Magazine, vol. 24, no. 2, pp. 21–31, 2007.
[3] A. Hirschberg, J. Gilbert, R. Msallam, and A. Wijnands, “Shock waves in trombones,”
The Journal of the Acoustical Society of America, vol. 99, no. 3, pp. 1754–1758, 1996.
[4] N. H. Fletcher and T. D. Rossing, The physics of musical instruments. Springer Science & Business Media, 2012.
[5] S. Bilbao and J. Chick, “Finite difference time domain simulation for the brass instrument bore,” The Journal of the Acoustical Society of America, vol. 134, no. 5, pp. 3860–3871, 2013.
[6] R. Harrison and J. Chick, “A single valve brass instrument model using finitedifference timedomain methods,” in International Symposium on Musical Acoustics, 2014.
[7] J. O. Smith, “Physical modeling using digital waveguides,” Computer Music Journal, vol. 16, no. 4, pp. 74–91, 1992.
[8] J. O. Smith, “Efficient simulation of the reedbore and bowstring mechanisms,” in Pro ceedings of the 1986 International Computer Music Conference, ICMC, 1986 , Den Haag, The Netherlands, October 2024, 1986, 1986.
[9] M. Karjalainen and C. Erkut, “Digital waveguides versus finite difference structures: Equivalence and mixed modeling,” EURASIP Journal on Applied Signal Processing, vol. 2004, pp. 978–989, 2004.
[10] A. Allen and N. Raghuvanshi, “Aerophones in flatland: Interactive wave simulation of wind instruments,” ACM Transactions on Graphics (TOG), vol. 34, no. 4, p. 134, 2015.
[11] C. M. Cooper and J. S. Abel, “Digital simulation of“brassiness＂and amplitudedependent propagation speed in wind instruments,” in Proc. 13th Int. Conf. on Digital Audio Effects (DAFx10), pp. 1–6, 2010.
[12] A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner,
A. Senior, and K. Kavukcuoglu, “Wavenet: A generative model for raw audio,” arXiv preprint arXiv:1609.03499, 2016.

[13] J. Engel, C. Resnick, A. Roberts, S. Dieleman, M. Norouzi, D. Eck, and K. Simonyan, “Neural audio synthesis of musical notes with wavenet autoencoders,” in Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 1068–1077, JMLR. org, 2017.
[14] S. Adachi and M.a. Sato, “Trumpet sound simulation using a twodimensional lip vibra tion model,” The Journal of the Acoustical Society of America, vol. 99, no. 2, pp. 1200– 1209, 1996.
[15] H. Boutin, N. Fletcher, J. Smith, and J. Wolfe, “Relationships between pressure, flow, lip motion, and upstream and downstream impedances for the trombone,” The Journal of the Acoustical Society of America, vol. 137, no. 3, pp. 1195–1209, 2015.
[16] P. Cook, “Tbone: An interactive waveguide brass instrument synthesis workbench for the next machine,” in Proc. Int. Computer Music Conf., pp. 297–299, 1991.
[17] J. O. Smith, “Virtual acoustic musical instruments: Review and update,” Journal of New Music Research, vol. 33, no. 3, pp. 283–304, 2004.
[18] P. R. Cook, “Synthesis toolkit in c++, version 1.0,” SIGGRAPH Proceedings, Assoc. Comp. Mach, 1996.
[19] A. H. Benade, “On the propagation of sound waves in a cylindrical conduit,” The Journal of the Acoustical Society of America, vol. 44, no. 2, pp. 616–623, 1968.
[20] D. H. Keefe, “Acoustical wave propagation in cylindrical ducts: Transmission line param eter approximations for isothermal and nonisothermal boundary conditions,” The Journal of the Acoustical Society of America, vol. 75, no. 1, pp. 58–62, 1984.
[21] A. H. Benade and E. Jansson, “On plane and spherical waves in horns with nonunif orm flare: I. theory of radiation, resonance frequencies, and mode conversion,” Acta Acustica united with Acustica, vol. 31, no. 2, pp. 79–98, 1974.
[22] V. Pagneux, N. Amir, and J. Kergomard, “A study of wave propagation in varying cross section waveguides by modal decomposition. part i. theory and validation,” The Journal of the Acoustical Society of America, vol. 100, no. 4, pp. 2034–2048, 1996.
[23] J. Kemp, N. Amir, D. Campbell, and M. van Walstijn, “Multimodal propagation in acoustic horns,” in Proc. International Symposium on Musical Acoustics, Perugia, Italy, pp. 521– 524, 2001.
[24] J. O. Smith, Physical audio signal processing: For virtual musical instruments and audio effects. W3K publishing, 2010.
[25] V. Fréour and G. P. Scavone, “Acoustical interaction between vibrating lips, downstream air column, and upstream airways in trombone performance,” The Journal of the Acous tical Society of America, vol. 134, no. 5, pp. 3887–3898, 2013.
[26] V. Fréour, N. Lopes, T. Hélie, R. Caussé, and G. Scavone, “Invitro and numerical inves tigations of the influence of a vocaltract resonance on lip autooscillations in trombone performance,” Acta Acustica united with Acustica, vol. 101, no. 2, pp. 256–269, 2015.

[27] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, p. 436, 2015.
[28] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep con volutional neural networks,” in Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
[29] G. Favier, A. Kibangou, and A. Khouaja, “Nonlinear system modelling by means of volterra models. approaches for the parametric complexity reduction,” in Symposium Tech niques Avancées et Stratégies Innovantes en Modélisation et Commande Robuste des pro cessus industriels, 2004.
[30] T. G. Burton and R. A. Goubran, “A generalized proportionate subband adaptive second order volterra filter for acoustic echo cancellation in changing environments,” IEEE Trans actions on Audio, Speech, and Language Processing, vol. 19, no. 8, pp. 2364–2373, 2011.
[31] M. A. M. Ramírez and J. D. Reiss, “Modeling nonlinear audio effects with endtoend deep neural networks,” in ICASSP 20192019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 171–175, IEEE, 2019.
[32] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
[33] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500, 2017.
[34] J. Pinto, G. S. Sivaram, H. Hermansky, and M. MagimaiDoss, “Volterra series for ana lyzing mlp based phoneme posterior estimator,” in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1813–1816, IEEE, 2009.
[35] P. S. Heuberger, P. M. van den Hof, and B. Wahlberg, Modelling and identification with rational orthogonal basis functions. Springer Science & Business Media, 2005.
[36] J. Pennington, S. Schoenholz, and S. Ganguli, “Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice,” in Advances in Neural Information Processing Systems, pp. 4785–4795, 2017.
[37] D.A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” arXiv preprint arXiv:1511.07289, 2015.
[38] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on Machine Learning (ICML10), pp. 807–814, 2010.
[39] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat,
G. Irving, M. Isard, et al., “Tensorflow: A system for largescale machine learning,” in
Symposium on Operating Systems Design and Implementation, pp. 265–283, 2016.
[40] R. Msallam, S. Dequidt, R. Causse, and S. Tassart, “Physical model of the trombone in cluding nonlinear effects. application to the sound synthesis of loud tones,” Acta Acustica united with Acustica, vol. 86, no. 4, pp. 725–736, 2000.

[41] T. R. Moore, “The acoustics of brass musical instruments,” Acoustic Today, 2016.
[42] C. A. Macaluso and J.P. Dalmont, “Trumpet with nearperfect harmonicity: Design and acoustic results,” The Journal of the Acoustical Society of America, vol. 129, no. 1, pp. 404–414, 2011.
[43] M. Morise, F. Yokomori, and K. Ozawa, “World: a vocoderbased highquality speech synthesis system for realtime applications,” IEICE Transactions on Information and Sys tems, vol. 99, no. 7, pp. 1877–1884, 2016.
[44] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[45] R. Hacıoğlu and G. A. Williamson, “Reduced complexity volterra models for nonlinear system identification,” EURASIP Journal on Advances in Signal Processing, vol. 2001, no. 4, p. 734913, 2001.
[46] G. Zoumpourlis, A. Doumanoglou, N. Vretos, and P. Daras, “Nonlinear convolution fil ters for cnnbased learning,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4761–4769, 2017.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文