帳號:guest(3.144.47.199)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):澤維爾
作者(外文):Munguia Velez, Kelvin Xavier
論文名稱(中文):複音音樂作曲:一個對抗逆增強式學習法
論文名稱(外文):Polyphonic Music Composition: An Adversarial Inverse Reinforcement Learning Approach
指導教授(中文):蘇豐文
指導教授(外文):Soo, Von-Wun
口試委員(中文):邱瀞德
陳朝欽
口試委員(外文):Chiu, Ching-Te
Chen, Chaur-Chin
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:107065424
出版年(民國):109
畢業學年度:108
語文別:英文
論文頁數:93
中文關鍵詞:複音音樂作曲對抗逆增強式學
外文關鍵詞:music compositionadversarial inverse reinforcement learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:900
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
自動音樂旋律產生傳統上是用監督式深度學習模型來訓練。但是這個方法有個缺陷會導致不悅耳的旋律產生包括過度重複相同樣式。有鑒於近年增強式學習法在很多領域的成功, 這篇論文探討另一個方法來結合新的監督式深度學習法, 深度增強式學習法與逆增強式學習法以產生旋律作曲。音樂產生可以視為時間軸上一系列的動作, 每個動作在作曲中選擇了一個和弦音符, 因此允許我們將旋律作曲視為增強式學習問題中尋求最大累積奬勵的一系列動作的解。我們首先用監督式學習訓練雙軸長短期記憶體期模型(Bi-axial LSTM model)並用深度Q-learning 方法來微調改進。但是如何設計一個好的獎勵函數是非常弔詭與困難的。我們用對抗式逆增強式學習法從人類專家的作曲作品軌跡中學得獎勵函數。結合這個獎勵函數與音樂理論規則, 我們改善用了使用監督式學習模型所產生的音樂。結果顯示, 我們的方法所產生的音樂在客觀的指標與使用者喜好的主觀評估
上都優於未經逆監督學習改善的模型
Deep Supervised Learning models are traditionally used for automatic music harmony generation. However, this approach suffers from limitations which may lead to unpleasing harmonies, including excessive repetition of patterns. Motivated by the recent success of reinforcement learning in multiple fields, this dissertation explores an alternative approach to harmony composition using a combination of novel Deep Supervised Learning, Deep Reinforcement Learning and Inverse Reinforcement Learning techniques. Music generation can be seen as a sequence of actions through time, with taking an action being equivalent to selecting the next chord in the composition, therefore allowing us to model harmony composition as a reinforcement learning problem in which we look to maximize an accumulated reward. We start by training a Bi-axial LSTM model using supervised learning and improve upon it by tuning it using Deep Q-learning. However, designing a good reward function is known to be a tricky and difficult process, so to overcome this we propose learning a reward function from a set of human-composed tracks using Adversarial Inverse Reinforcement Learning. We combine this learned reward function with a reward based on music theory rules to improve the generation of the model trained by supervised learning. The results show improvement over a pre-trained model not trained with reinforcement learning with respect to a set of objective metrics and preference from users based on user studies.
摘要 i
Abstract ii
Acknowledgement iii
List of Tables vii
List of Figures viii
1 Introduction 1
1.1 Motivation 1
1.2 Goal 2
1.3 New Contributions 4
2 Related Works 6
3 Background 9
3.1 Musical Instrument Digital Interface 9
3.2 Music Representation 10
3.3 Supervised Learning 12
3.3.1 Recurrent Neural Networks 13
3.3.2 Bi-Axial LSTM 14
3.4 The Reinforcement Learning 19
3.4.1 The Q-Learning 20
3.4.2 The Deep Q-Learning 22
3.4.3 The Inverse Reinforcement Learning 23
3.4.4 The Maximum Entropy Inverse Reinforcement Learning 25
3.4.5 The Guided Cost Learning 28
3.4.6 Generative Adversarial Guided Cost Learning (GAN-GCL) 30
3.4.7 The Adversarial Inverse Reinforcement Learning 34
4 Methodology 37
4.1 System Overview 37
4.2 Bi-axial training 39
4.3 Reward function extraction using 42
4.4 Tuning using a Deep Q-Network 44
5 Experiments 53
5.1 Dataset 53
5.2 Biaxial LSTM 55
5.3 AIRL Reward Function 57
5.4 Deep Q-Network Tuning 59
6 Results 61
6.1 Objective Evaluation 63
6.1.1 Performance Metrics 63
6.1.2 Metrics Comparison 66
6.2 Subjective Evaluation 72
6.2.1 Examples of Generated Compositions 72
6.2.2 User Study 76
7 Conclusion and Future Work 83
7.1 Conclusion 83
7.2 Future Work 85
References 87
A More User Study Results 92
[1] Aron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals,Alexander Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu.Wavenet: A generative model for raw audio. InArxiv, 2016. URLhttps://arxiv.org/abs/1609.03499.
[2] Bob Sturm, Joo Santos, Oded Ben-Tal, and Iryna Korshunova. Music transcriptionmodelling and composition using deep learning.Conference on Computer Simulationof Musical Creativity, 2016.
[3] Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. Midinet: A convolutional genera-tive adversarial network for symbolic-domain music generation.International Societyof Music Information Retrieval Conference, 2017.
[4] Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Si-mon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu,and Douglas Eck. Music transformer.International Conference on Learning Repre-sentations, 2019.
[5] Gabriel Guimaraes, Benjamin Sanchez, Pedro Farias, and Aln Aspuru-Guzik.Objective-reinforced generative adversarial networks (organ) for sequence generationmodels.ArXiv, 2017.
[6] Natasha Jaques, Shixiang Gu, Richard E. Turner, and Douglas Eck. Tuning recurrentneural networks with reinforcement learning.International Conference on LearningRepresentations, 2017.
[7] Nikhil Kotecha. Bach2bach: Generating music using a deep reinforcement learning approach.https://arxiv.org/ftp/arxiv/papers/1812/1812.01060.pdf, 2018.
[8] Orry Messer and Pravesh Ranchod. The use of apprenticeship learning via inverse reinforcement learning for generating melodies.International Computer Music Con-ference, 2014.
[9] Daniel D. Johnson. Composing music with recurrent neural networks. http://www.hexahedria.com/2015/08/03/composing-music-with-recurrent-neural-networks/. Accessed: 2020-06-2.
[10] Jeffrey L. Elman. Finding structure in time.Cognitive Science, 1990.
[11] Sepp Hochreiter and Jurgen Schmidhuber. Long-short term memory.Neural Computation, 1997.
[12] Daniel D. Johnson. Generating polyphonic music using tied parallel networks.Inter-national Conference on Computational Intelligence in Music, Sound, Art and Design,2017.88
[13] Christopher Watkins.Learning From Delayed Rewards. PhD thesis, King’s College,Cambridge, 1989.
[14] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, IoannisAntonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep rein-forcement learning.Conference on Neural Information Processing Systems, 2013.
[15] Long-Ji Lin.Reinforcement Learning for Robots Using Neural Networks. PhD thesis,School of Computer Science, Carnegie Mellon University, 1993.
[16] Richard S. Sutton and Andrew G Barto.Reinforcement Learning: An Introduction.The MIT Press, second edition, 2018.
[17] Pieter Abbeel and Andrew Y. Ng. Apprenticeship learning via inverse reinforcementlearning.International Conference on Machine Learning, 2004.
[18] Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey. Maximumentropy inverse reinforcement learning.AAAI Conference on Artificial Intelligence,2008.
[19] Chelsea Finn, Paul Christiano, Pieter Abbeel, and Sergey Levine. A connection be-tween generative adversarial networks, inverse reinforcement learning, and energy-based models.Neural Information Processing Systems Conference, 2016.
[20] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Neural Information Processing Systems Conference, 2014.89
[21] Justin Fu, Katie Luo, and Sergey Levine. Learning robust rewards with adversarial in-verse reinforcement learning.International Conference on Learning Representations,2018.
[22] Matthew D. Zeiler. Adadelta: An adaptive learning method.ArXiv, 2012.
[23] Nikhil Kotecha. Generating music.https://github.com/nikhil-kotecha/GeneratingMusic. Accessed: 2020-06-2.
[24] Justin Fu. Inverse rl.https://github.com/justinjfu/inverserl.Accessed: 2020-06-2.
[25] Google Magenta Team. Magenta: Music and art generation with machine intelli-gence.https://github.com/magenta/magenta. Accessed: 2020-06-2.
[26] Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang. Musegan: Multi-track sequential generative adversarial networks for symbolic music generation andaccompaniment.AAAI Conference on Artificial Intelligence (AAAI), 2018.
[27] Colin Raffel.Learning-Based Methods for Comparing Sequences, with Applicationsto Audio-to-MIDI Alignment and Matching. PhD thesis, Colombia University, 2016.
[28] Diederik P. Kingma and Jimmi Lei Ba. Adam: A method for stochastic optimization.International Conference for Learning Representations, 2015.
[29] John Schulman, Sergey Levine, Philipp Moritz, Micheal Jordan, and Pieter Abbeel.Trust region policy optimization.International Conference on Machine Learning,2015.90
[30] Hao-Wen Dong and Yi-Hsuan Yang. Convolutional generative adversarial networkswith binary neurons for polyphonic music generation.International Society for MusicInformation Retrieval, 2018.
[31] Mohammad Akbari and Jie Liang. Semi-recurrent cnn-based vae-gan for sequentialdata generation.IEEE International Conference on Acoustics, Speech and SignalProcessing, 2018.
[32] Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sen-tinel mixture models, 2016.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *