變分自動編碼器用於複音音樂插值__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.233) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	荷西
作者(外文):	López Diéguez, Pablo
論文名稱(中文):	變分自動編碼器用於複音音樂插值
論文名稱(外文):	Variational Autoencoders for Polyphonic Music Interpolation
指導教授(中文):	蘇豐文
指導教授(外文):	Soo, Von-Wun
口試委員(中文):	陳朝欽邱瀞德
口試委員(外文):	Chen, Chaur-Chin Chiu, Ching-Te
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊系統與應用研究所
學號:	107065431
出版年(民國):	109
畢業學年度:	108
語文別:	英文
論文頁數:	63
中文關鍵詞:	變分自動編碼器、複音音樂、插值、自動編碼器
外文關鍵詞:	Variational Autoencoder、Polyphonic music、Interpolation、Autoencoder、VAE
相關次數:	推薦:0 點閱:1594 評分: 下載:20 收藏:0

本論文旨在使用機器學習技術來解決插值音樂作曲的新型問題。我們提出兩個基於變分自動編碼器的模型來給予兩首歌曲之間生成適當的多音軌旋律，以便流暢地改變音高與動態去橋接。第一個模型產生的插值音樂表現超越隨機產生的資料基底與雙向LSTM的方法，其表現可與當前最新技術相媲美。而第二個新穎架構的模型用超越目前技術水準的插值方法去重建誤差，它利用額外的類神經網路去直接估算插值編碼的向量。此外，我們製造的新竹插值MIDI資料集使得訓練文獻中的方法與論文中的模型在計算與時間要求上更有效率。最後我們完成量化的使用者調查去確保結果的效力。

This thesis aims to use Machine Learning techniques to solve the novel problem of music interpolation composition. Two models based on Variational Autoencoders (VAEs) are proposed to generate a suitable polyphonic harmonic bridge between two given songs, smoothly changing the pitches and dynamics of the interpolation. The interpolations generated by the first model surpass a Random data baseline and a bidirectional LSTM approach and its performance is comparable to the current state-of-the-art. The novel architecture of the second model outperforms the state-of-the-art interpolation approaches in terms of reconstruction loss by using an additional neural network for direct estimation of the interpolation encoded vector. Furthermore, the Hsinchu Interpolation MIDI Dataset was created, making both models proposed in this thesis more efficient than previous approaches in the literature in terms of computational and time requirements during training. Finally, a quantitative user study was done in order to ensure the validity of the results.

Contents

Acknowledgements ............................................... i
Abstract ....................................................... ii

1 Introduction ................................................. 1
1.1 Novelty .................................................... 2

2 Related Work ................................................. 4

3 Main Objectives .............................................. 8

4 Theoretical Background ....................................... 10
4.1 Symbolic Music Representation .............................. 10
4.2 Autoencoder ................................................ 13
4.2.1 Encoder .................................................. 14
4.2.2 Decoder .................................................. 15
4.2.3 Loss Function ............................................ 16
4.3 VariationalAutoencoder ..................................... 17
4.3.1 Mathematical foundations and motivation of VAEs .......... 17
4.3.2 LossFunction: ELBO ....................................... 21
4.3.3 ReparametrizationTrick ................................... 22

5 Methodology .................................................. 24
5.1 Build Methodology .......................................... 24
5.2 Experimental Methodology ................................... 26
5.3 Quantitative Survey Methodology ............................ 28

6 Experimental Procedure ....................................... 29
6.1 Hsinchu Interpolation MIDI Dataset ......................... 30
6.1.1 Data Preprocessing ....................................... 30
6.1.2 Word2Vec and Cosine based Similarity ..................... 31
6.1.3 Neural Network based Similarity .......................... 32
6.1.4 Decision Tree based Similarity ........................... 34
6.1.5 Logistic Regression based Similarity ..................... 36
6.2 Random Baseline Data Interpolation ......................... 36
6.3 Bidirectional LSTM ......................................... 37
6.3.1 Data Preprocessing ....................................... 38
6.3.2 Architecture ............................................. 38
6.3.3 Training ................................................. 39
6.3.4 Interpolation ............................................ 39
6.4 VAE:linear sampling of latent space ........................ 40
6.4.1 Encoder .................................................. 41
6.4.2 Decoder .................................................. 42
6.4.3 LossFunction ............................................. 43
6.4.4 Training ................................................. 43
6.4.5 Interpolation ............................................ 44
6.5 VAE+NN:direct estimation of interpolation encoded vector ... 46
6.5.1 Data Preprocessing ....................................... 47
6.5.2 Neural network architecture .............................. 47
6.5.3 Neural network training .................................. 49
6.5.4 Interpolation ............................................ 50
6.6 Quantitative User Study .................................... 51

7 Results and Discussion ....................................... 53
7.1 Metric based Results ....................................... 53
7.2 Listening Test Results ..................................... 56

8 Conclusion ................................................... 59
8.1 Future Work ................................................ 60

References ......................................................62

[1] L. Weng, “From autoencoder to betavae.” http://lilianweng.github.io/lil-log/ 2018/08/12/from-autoencoder-to-beta-vae.html,2018.
[2] C.Doersch,“Tutorialonvariationalautoencoders,”2016.
[3] D. P. Kingma and M. Welling, “An introduction to variational autoencoders,” FoundationsandTrendsinMachineLearning,pp.1–18,2019. [4] A.TöscherandM.Jahrer,“Thebigchaossolutiontothenetflixgrandprize,”2009.
[5] N. Jiang, S. Jin, Z. Duan, and C. Zhang, “Rlduet: Online music accompaniment generationusingdeepreinforcementlearning,”2020.
[6] S. I. Mimilakis, E. Cano, J. Abeßer, and G. Schuller, “New sonorities for jazz recordings: Separationandmixingusingdeepneuralnetworks,”2016.
[7] R.K.Zaripov,“Analgorithmicdescriptionofaprocessofmusicalcomposition,” Dokl.Akad.NaukSSSR,p.1283–1286,1960.
[8] A.Roberts,J.Engel,C.Raffel,C.Hawthorne,andD.Eck,“Ahierarchicallatent vectormodelforlearninglongtermstructureinmusic,”2018.
[9] A. Karpathy, P. Abbeel, G. Brockman, P. Chen, V. Cheung, R. Duan, I. Goodfellow, D. Kingma, J. Ho, R. Houthooft, T. Salimans, J. Schulman, I. Sutskever, and W. Zaremba, “Generative models.” https://openai.com/blog/ generative-models/,2016.
[10] G. Brunner, A. Konrad, Y. Wang, and R. Wattenhofer, “Midivae: Modeling dynamicsandinstrumentationofmusicwithapplicationstostyletransfer,”2018.
[11] S.Dai,Z.Zhang,andG.G.Xia,“Musicstyletransfer: Apositionpaper,”2018.
[12] C. Raffel, “Learningbased methods for comparing sequences, with applications toaudiotomidialignmentandmatching,”PhDThesis,2016.
[13] O. M. Bjørndalen and R. Binkys, “Mido.” https://mido.readthedocs.io/en/ latest/index.html,2013.
[14] E.Theron,“pymidi.”https://pypi.org/project/py-midi/,2017.
[15] B.McFee,C.Raffel,D.Liang,D.P.Ellis,M.McVicar,E.Battenbergk,andO.Nieto,“librosa: Audioandmusicsignalanalysisinpython,”2015.
62
[16] C.RaffelandD.P.W.Ellis,“Intuitiveanalysis,creationandmanipulationofmidi data with pretty_midi,” 15th International Conference on Music Information RetrievalLateBreakingandDemoPapers,2014.
[17] D.M.Huber,TheMIDIManual. Carmel,Indiana: SAMS,1991.
[18] B.BenwardandM.N.Saker,Music: InTheoryandPractice. Boston: McGrawHill,2003.
[19] D.E.Rumelhart,G.E.Hinton,andR.J.Williams,“Learninginternalrepresentationsbyerrorpropagation,”Nature,p.533–536,1986.
[20] D.H.Ballard,“Modularlearninginneuralnetworks,”1987.
[21] B. G. Tabachnick and L. S. Fidell, Using Multivariate Statistics. Boston: Allyn andBacon,2001.
[22] J.N.Amaral,M.Buro,R.Elio,J.Hoover,I.Nikolaidis,M.Salavatipour,L.Stewart,andK.Wong,“Aboutcomputingscienceresearchmethodology,”
[23] W. Faria, “Midi music data extraction using music21 and word2vec on kaggle.” https://towardsdatascience.com/ midi-music-data-extraction-using-music21-and-word2vec-on-kaggle-cb383261cd4e, 2018.
[24] M.Cuthbert,C.Ariza,B.Hogue,andJ.W.Oberholtzer,“music21project.”http: //web.mit.edu/music21/,2006.
[25] D.EckandJ.Schmidhuber,“Afirstlookatmusiccompositionusinglstmrecurrent neuralnetworks,”2002.
[26] X.Hou,K.Sun,L.Shen,andG.Qiu,“Deepfeatureconsistentvariationalautoencoder,”2016.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文