帳號:guest(3.23.103.9)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):林子軒
作者(外文):Lin, Tzu-Hsuan
論文名稱(中文):基於人聲旋律的流行音樂即時伴奏生成系統
論文名稱(外文):Real-time pop music accompaniment generation accordingto vocal melody by deep learning models
指導教授(中文):蘇豐文
指導教授(外文):SOO, VON-WUN
口試委員(中文):沈之涯
蘇黎
口試委員(外文):SHEN, CHIH-YA
SU, LI
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:107065525
出版年(民國):110
畢業學年度:109
語文別:英文
論文頁數:43
中文關鍵詞:深度學習即時伴奏生成流行音樂
外文關鍵詞:DeepLearningReal-TimeAccompanimentPop-music
相關次數:
  • 推薦推薦:0
  • 點閱點閱:209
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
近年來,於現在的的音訊處理技術中,我們已經看見在許多技術,如伴奏生成、人聲採譜上都已經取得了一定的成果。然而至今卻還未有將兩者結合的成果出現過,因此,我們融合了目前這些出色的成果,並改進其效率,進而提出了新的「即時伴奏系統」。在這個系統中,我們優化了過去人聲採譜模型的運算效率,以及提出精簡過的HMM-base伴奏生成模型,以在有限的時間內實時生成伴奏。我們認為這個成果將會幫助許多單人歌手獨力創造更完整的表演。
Abstract
The goal of this work is to propose a real-time accompaniment system to assist singers in complete a simple demo by themselves.
By current audio signal technology, we have seen some achievements on accompani-ment generation and some on vocal transcription. Basing on these great works, we propose a novel “Real-time accompany generation system” to combine current state-of-arts and further improve the efficiency to reach real-time human interactive mode. To reach a ac-ceptable computing efficiency, we do a lot pruning on original model and apply DenseNet concept to enhance its gradient propagate.
In this system, we integrate efficiency improved vocal transcription model and sim-plified HMM-base accompaniment generation model which can better fit small training set situation to output musical accompaniment in limited time. We believe that this work will benefit many solo singers to deliver their live shows or demos by themselves whenever they need.
As a result, we reach real-time under 180 BPM which covers most of pop music and propose a highly improved vocal transcription model with 1/1000 parameters and 1/50 FLOPs.
摘要
Abstract
Acknowledgement
List of Tables
List of Figures
Introduction----------------------------1
Related Work-------------------------6
Methodology--------------------------9
Experiments and Results----------27
Conclusion and Future Work-----35
References----------------------------38
[1] Nan Jiang, Sheng Jin, Zhiyao Duan, and Changshui Zhang. Rl-duet: Online music accompaniment generation using deep reinforcement learning, 2020.
[2] Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, and Tie-Yan Liu. Popmag: Pop music accompaniment generation, 2020.
[3] Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, and Tie-Yan Liu. Popmag: Pop music accompaniment generation. pages 1198–1206, 10 2020. doi: 10.1145/ 3394171.3413721.
[4] Andrew McLeod, Rodrigo Schramm, Mark Steedman, and Emmanouil Benetos. Au-tomatic transcription of polyphonic vocal music. Applied Sciences, 7(12), 2017. ISSN 2076-3417. doi: 10.3390/app7121285. URL https://www.mdpi.com/ 2076-3417/7/12/1285.
[5] Li Su Zih-Sing Fu. Hierarchical classification networks for singing voice segmenta-tion and transcription. International Society for Music Information Retrieval Confer-ence (ISMIR), 2019.
[6] Qiuqiang Kong, Bochen Li, Xuchen Song, Yuan Wan, and Yuxuan Wang. High-resolution piano transcription with pedals by regressing onsets and offsets times, 2020.
[7] Ian Simon, Dan Morris, and Sumit Basu. Mysong: Automatic accompaniment gener-ation for vocal melodies. In Proceedings of the SIGCHI Conference on Human Fac-tors in Computing Systems, CHI ’08, page 725–734, New York, NY, USA, 2008. As-sociation for Computing Machinery. ISBN 9781605580111. doi: 10.1145/1357054. 1357169. URL https://doi.org/10.1145/1357054.1357169.
[8] Li Luo, Peng-Fei Lu, and Zeng-Fu Wang. A real-time accompaniment system based on sung voice recognition. In 2008 19th International Conference on Pattern Recog-nition, pages 1–4, 2008. doi: 10.1109/ICPR.2008.4761071.
[9] J. Salamon, E. Gomez, D. P. W. Ellis, and G. Richard. Melody extraction from poly-phonic music signals: Approaches, applications, and challenges. IEEE Signal Pro-cessing Magazine, 31(2):118–134, 2014. doi: 10.1109/MSP.2013.2271648.
[10] A. Klapuri E. G´omez and B. Meudic. Melody description and extraction in the context of music content processing. J. New Music Res., vol. 32, no. 1, pp. 23–40, 2003.
[11] M. Ryyn¨anen and A. Klapuri. Automatic transcription of melody, bass line, and chords in polyphonic music. Comput. Music J., vol. 32, no. 3, pp. 72–86, 2008.
[12] V. Rao and P. Rao. Vocal melody extraction in the presence of pitched accompa-niment in polyphonic music. IEEE Transactions on Audio, Speech, and Language Processing, 18(8):2145–2154, 2010. doi: 10.1109/TASL.2010.2042124.
[13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015.
[14] Jean-Pierre Briot, Ga¨etan Hadjeres, and Franc¸ois-David Pachet. Deep learning tech-niques for music generation – a survey, 2019.
[15] Li Su. Vocal melody extraction using patch-based cnn, 2018.
[16] Y.-H. Yang C.-Y. Liang, L. Su and H.-M. Lin. Musical offset detection of pitched instruments: The case of violin. In ISMIR, pages 281–287,, 2015.
[17] S. B¨ock, A. Arzt, F. Krebs, and M. Schedl. Online real-time onset detection with recurrent neural networks. 2012.
[18] Masanori MORISE, Fumiya YOKOMORI, and Kenji OZAWA. World: A vocoder-based high-quality speech synthesis system for real-time applications. IEICE Trans-actions on Information and Systems, E99.D(7):1877–1884, 2016. doi: 10.1587/transinf.2015EDP7457.
[19] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks, 2018.
[20] Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, and Alexander Smola. Resnest: Split-attention networks, 2020.
[21] Florian Krebs Sebastian B¨ock and Gerhard Widmer. Joint beat and downbeat tracking with recurrent neural networks. Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016.
[22] Masanori Morise, Hideki Kawahara, and Haruhiro Katayose. Fast and reliable f0 estimation method based on the period extraction of vocal fold vibration of singing voice and speech. December 2009. AES 35th International Conference: Audio for Games ; Conference date: 11-02-2009 Through 13-02-2009.
[23] Joaquin Mora, Francisco G´omez, Emilia G´omez, Francisco Javier Borrego, and Jose´ D´ıaz-B´a˜nez. Characterization and melodic similarity of a cappella flamenco cantes. Proceedings of the 11th International Society for Music Information Retrieval Con-ference, ISMIR 2010, pages 351–356, 01 2010.
[24] E. G´omez and J. Bonada. Towards computer-assisted flamenco transcription: An ex-perimental comparison of automatic transcription algorithms as applied to a cappella singing. Computer Music Journal, 37(2):73–90, 2013. doi: 10.1162/COMJ a 00180.
[25] L. J. Tard´on I. Barbancho-Perez E. Molina, A. M. Barbancho-Perez. Evaluation framework for automatic singing transcription. 2014.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *