帳號:guest(18.119.122.206)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):黃梓寧
作者(外文):Huang, Tzu-Ning
論文名稱(中文):基於多任務卷積遞迴神經網路之即時吉他和弦辨識
論文名稱(外文):Real-time Guitar Chord Recognition Based on Multi-task Convolutional Recurrent Neural Networks
指導教授(中文):劉奕汶
指導教授(外文):Liu , Yi-Wen
口試委員(中文):蘇黎
徐嘉連
口試委員(外文):Su, Li
Hsu, Jia-Lien
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:109061648
出版年(民國):112
畢業學年度:111
語文別:英文
論文頁數:48
中文關鍵詞:和弦辨識多任務學習卷積遞迴神經網路
外文關鍵詞:Chord RecognitionMulti-task LearningConvolutional Recurrent Neural Network
相關次數:
  • 推薦推薦:0
  • 點閱點閱:80
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
和弦辨識為音樂資訊檢索發展多年的主題之一,其目地為辨識一段音樂所含 有的和弦種類、和弦開始時間與結束的時間。依照任務難易度,和弦辨識可分為少 類別和弦辨識及多類別和弦辨識。少類別和弦辨識僅以分類大三和弦及小三和弦為 目標,而多類別和弦辨識則需考慮更複雜的和弦,例如:增和弦、減和弦和掛留和 弦,因此其辨識難度遠高於少類別和弦辨識。隨著電腦科學的蓬勃發展,現階段的 少類別和弦辨識已達相當的準確度。然而,針對多類別的和弦辨識仍有進步空間, 而造成其辨識困難的原因包括音樂檔案版權、和弦標註的複雜程度和品質不一,及 和弦種類繁多且數量分佈不均,上述問題使得多類別和弦辨識至今依然困難。本研 究之目標為和弦辨識技術在伴奏系統上之應用,我們希望透過和弦辨識系統偵測使 用者在彈奏吉他時出現的和弦,進而輸出一段音律和諧的貝斯伴奏。因此,為了改 善傳統獨熱編碼在不同程度的辨識錯誤時,卻會得到相同交叉熵損失之缺點,我 們提出一種考量和弦相似度的編碼方法,使模型在訓練階段即使辨識錯誤,依舊 能依程度而獲得不同的損失值。本論文亦透過音樂數位介面(Musical Instrument Digital Interface, MIDI)產生合成資料集,並將其中的稀有和弦加入真實資料以 輔助訓練,期望能減少和弦種類分佈不均所造成的影響。本研究實驗結果發現上述 所提出之方法,皆有助於提升和弦辨識的效果,最後則針對其結果進行分析。
Chord recognition is a research topic that has been developed for years in music information retrieval (MIR), and the objective is to identify the chords that occur in a piece of music and their start times and end times. Chord recognition can be catego- rized into small vocabulary chord recognition (SVCR) and large vocabulary chord recognition (LVCR). SVCR only deals with major and minor chords, while LVCR considers more complex chords such as augmented chords, diminished chords, and suspended chords. In this research, we focus on LVCR which is much more difficult due to copyright issues, the complexity of chord annotation, and the lack of high- quality chord datasets. The diversity and imbalanced distribution of chords also increase the difficulty of performing the task of chord recognition. In this work, we aim to apply the chord recognition technique to an accompaniment system, which enables the real-time detection of the chords while the user is playing the guitar. As the system detects the chords, it should automatically provide a harmonious bass ac- companiment. To achieve this goal, we propose an encoding method that considers the similarity between chords, so that the model calculates different loss values dur- ing the training process based on how far the predicted chords are from the answer. A synthesized dataset generated with Musical Instrument Digital Interface (MIDI) is also utilized to increase the number of rare chords so that the imbalanced data prob- lem can be reduced. Experiments were conducted to evaluate our proposed methods, and the results show that all the above methods help to improve the performance of chord recognition.
1. Introduction - 1
1.1 Motivation - 1
1.2 Problem Statement - 1
1.3 Goals - 3
1.4 Thesis Organization - 3
2. Background Knowledge and Related Work - 5
2.1 Chord Structure Decomposition -5
2.2 Chord Recognition and Deep Learning Neural Networks - 7
2.3 MIDI-synthesized Datasets - 10
2.4 A Real-time Jamming System Demonstration Platform - 11
3. Methods - 13
3.1 Preprocessing - 13
3.2 Strategies for Handling Imbalanced Data - 17
3.3 Similarity Encoding - 18
3.4 Multi-task Convolutional Recurrent Network - 21
3.5 Training Techniques - 22
3.6 Pattern Finding Methods - 23
4. Experiments and Results - 26
4.1 Dataset -26
4.2 Experiment Setup - 29
4.3 Metrics - 30
4.4 Evaluation - 31
5. Conclusion - 41
6. Future Work - 43
[1] Y.Bando and M.Tanaka,“A chord recognition method of guitar sound using its constituent tone information,” IEEJ Transactions on Electrical and Electronic Engineering, vol. 17, no. 1, pp. 103–109, 2022.
[2] K. Lee, “Automatic chord recognition using an HMM with supervised learn- ing,” ISMIR, 2006.
[3] J. Jiang, K. Chen, W. Li, and G. Xia, “Large-vocabulary chord transcription via chord structure decomposition,” in ISMIR, pp. 644–651, 2019.
[4] C. Gamer, “Some combinational resources of equal-tempered systems,” Jour- nal of Music Theory, vol. 11, no. 1, pp. 32–59, 1967.
[5] H. Boatwright, “Harmonic materials of modern music: Resources of the tem- pered scale,” Journal of the American Musicological Society, vol. 17, no. 3, pp. 408–413, 1964.
[6] J. Jiang, K. Chen, W. Li, and G. Xia, “MIREX 2018 submission: A structural chord representation for automatic large-vocabulary chord transcription,” Pro- ceedings of the Music Information Retrieval Evaluation eXchange, 2018.
[7] Y. Wu, T. Carsault, and K. Yoshii, “Automatic chord estimation based on a frame-wise convolutional recurrent neural network with non-aligned anno- tations,” in 2019 27th European Signal Processing Conference (EUSIPCO), pp. 1–5, IEEE, 2019.
[8] T. Fujishima, “Real-time chord recognition of musical sound: A system us- ing common lisp music,” in Proceedings of the International Computer Music Conference, 1999.
[9] L. E. Baum and T. Petrie, “Statistical inference for probabilistic functions of finite state Markov chains,” The Annals of Mathematical Statistics, vol. 37, no. 6, pp. 1554–1563, 1966.
[10] A. Sheh and D. P. Ellis, “Chord segmentation and recognition using EM- trained hidden Markov models,” in ISMIR, pp. 183–189, 2003.
[11] G.D.Forney,“TheViterbialgorithm,”ProceedingsoftheIEEE,vol.61,no.3, pp. 268–278, 1973.
[12] E. J. Humphrey and J. P. Bello, “Rethinking automatic chord recognition with convolutional neural networks,” in 2012 11th International Conference on Ma- chine Learning and Applications, pp. 357–362, IEEE, 2012.
[13] S.Sigtia,N.Boulanger-Lewandowski,andS.Dixon,“Audiochordrecognition with a hybrid recurrent neural network,” in ISMIR, pp. 127–133, 2015.
[14] B. McFee and J. P. Bello, “Structured training for large-vocabulary chord recognition,” in ISMIR, pp. 188–194, 2017.
44
[15] C.SchörkhuberandA.Klapuri,“Constant-Qtransformtoolboxformusicpro- cessing,” in 7th Sound and Music Computing Conference, Barcelona, Spain, pp. 3–64, 2010.
[16] Y.WuandW.Li,“AutomaticaudiochordrecognitionwithMIDI-traineddeep feature and BLSTM-CRF sequence decoding model,” IEEE/ACM Transac- tions on Audio, Speech, and Language Processing, vol. 27, no. 2, pp. 355–366, 2018.
[17] R. M. Bittner, B. McFee, J. Salamon, P. Li, and J. P. Bello, “Deep salience representations for F0 estimation in polyphonic music,” in ISMIR, pp. 63–70, 2017.
[18] G. Byambatsogt, L. Choimaa, and G. Koutaki, “Guitar chord sensing and recognition using multi-task learning and physical data augmentation with robotics,” Sensors, vol. 20, no. 21, p. 6077, 2020.
[19] R. Caruana, Multitask Learning. Springer, 1998.
[20] D. Kim, Y. Lee, and H. Ko, “Multi-task learning for animal species and group category classification,” in Proceedings of the 2019 7th International Confer- ence on Information Technology: IoT and Smart City, pp. 435–438, 2019.
[21] S.Ruder,“Anoverviewofmulti-tasklearningindeepneuralnetworks,”2017. arXiv:1706.05098.
[22] R. Caruana, “Multitask learning: A knowledge-based source of inductive bias,” in Proceedings of the Tenth International Conference on Machine Learn- ing, pp. 41–48, Citeseer, 1993.
[23] L. Duong, T. Cohn, S. Bird, and P. Cook, “Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser,” in Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 845–850, 2015.
[24] L. C. Lu, “An interactive call and response blue guitar jamming system based on Markov chain and music theory,” Master’s thesis, National Tsing Hua Uni- versity, 2022.
[25] T.Y.Lin,P.Goyal,R.Girshick,K.He,andP.Dollár,“Focallossfordenseob- ject detection,” in Proceedings of the IEEE International Conference on Com- puter Vision, pp. 2980–2988, 2017.
[26] S. Panchapagesan, M. Sun, A. Khare, S. Matsoukas, A. Mandal, B. Hoffmeis- ter, and S. Vitaladevuni, “Multi-task learning and weighted cross-entropy for dnn-based keyword spotting,” in Interspeech, pp. 760–764, 2016.
[27] D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 2, pp. 236–243, 1984.
45

[28] B.McFee,E.J.Humphrey,andJ.P.Bello,“Asoftwareframeworkformusical data augmentation,” in ISMIR, pp. 248–254, Citeseer, 2015.
[29] N. Takahashi, M. Gygli, B. Pfister, and L. Van Gool, “Deep convolutional neural networks and data augmentation for acoustic event detection,” 2016. arXiv:1604.07160.
[30] I. J. Good, “Rational decisions,” Journal of the Royal Statistical Society: Se- ries B (Methodological), vol. 14, no. 1, pp. 107–114, 1952.
[31] P. Cerda, G. Varoquaux, and B. Kégl, “Similarity encoding for learning with dirty categorical variables,” Machine Learning, vol. 107, no. 8-10, pp. 1477– 1494, 2018.
[32] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proceedings of the International Confer- ence on Machine Learning, vol. 30, p. 3, 2013.
[33] A. F. Agarap, “Deep learning using rectified linear units (ReLU),” 2018. arXiv:1803.08375.
[34] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 56, pp. 1929–1958, 2014.
[35] L. Liebel and M. Körner, “Auxiliary tasks in multi-task learning,” 2018. arXiv:1805.06334.
[36] A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7482–7491, 2018.
[37] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, “Convolutional neural networks for speech recognition,” IEEE/ACM Transac- tions on Audio, Speech, and Language Processing, vol. 22, no. 10, pp. 1533– 1545, 2014.
[38] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on Imagenet classification,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034, 2015.
[39] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feed- forward neural networks,” in Proceedings of the Thirteenth International Con- ference on Artificial Intelligence and Statistics, pp. 249–256, PMLR, May 2010.
[40] E. Cano, D. FitzGerald, A. Liutkus, M. D. Plumbley, and F.-R. Stöter, “Mu- sical source separation: An introduction,” IEEE Signal Processing Magazine, vol. 36, no. 1, pp. 31–40, 2018.
[41] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014. arXiv:1412.6980.
46

[42] R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in IJCAI, pp. 1137–1145, Montreal, Canada, 1995.
[43] F. Korzeniowski, D. R. Sears, and G. Widmer, “A large-scale study of lan- guage models for chord prediction,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 91–95, IEEE, 2018.
[44] T. Carsault, J. Nika, P. Esling, and G. Assayag, “Combining real-time extrac- tion and prediction of musical chord progressions for creative applications,” Electronics, vol. 10, no. 21, p. 2634, 2021.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *