帳號:guest(18.191.44.188)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):馮少迪
作者(外文):Feng, Shao-Di
論文名稱(中文):用更精緻的情感生成音樂
論文名稱(外文):Generate Music with More Refined Emotions
指導教授(中文):蘇豐文
指導教授(外文):Soo, Von-Wun
口試委員(中文):朱宏國
林豪鏘
口試委員(外文):Chu, Hung-Kuo
Lin, Hao-Qiang
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:109062470
出版年(民國):111
畢業學年度:111
語文別:英文
論文頁數:33
中文關鍵詞:音樂生成音樂情感控制半監督學習變分自編碼器
外文關鍵詞:music generationmusic emotion controllingsemi-supervised learningVariational Autoencoder
相關次數:
  • 推薦推薦:0
  • 點閱點閱:0
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
由於具有情感標籤的符號音樂數據集稀缺且不完整,因此生成具有特定情感的符音樂是一項具有挑戰性的任務。通常,數據集只標註悲傷或快樂等一般情感標籤,因此模型生成能力有限,只能生成帶有標籤情感的音樂。本研究旨在基於在 Russel 的 2D 情感模型中僅用四個像限標記的訓練數據集生成更精緻的情感。我們專注於Music Fadernet理論,將arousal和valence映射到低級屬性,結合Transformer和GM-VAE構建符號音樂生成模型。我們為模型採用了in-attention機制,並通過控制條件信息來改進它。並且我們展示了音樂生成模型可以根據用戶在高級語言表達方面指定的情感並通過操縱其相應的低階音樂屬性來控制音樂的生成。最後,我們使用預先訓練的情感分類器針對名為 EMOPIA 的流行鋼琴 midi 數據集來評估模型性能,並通過主觀聆聽評估,我們證明該模型可以正確地生成具有更精緻情感的音樂。
To generate symbolic music with specific emotion is a challenging task due to symbolic music datasets that have emotion labels are scarce and incomplete. Usually, datasets are only labeled with general emotion labels such as sadness or happiness, so the model generation ability is limited and can only generate music with labeled emotions. This research aims to generate more refined emotions based on the training datasets that are only labeled with four quadrants in the Russel's 2D emotion model. We focus on the theory of Music Fadernet and map arousal and valence to the low-level attributes, and build a symbolic music generation model by combining transformer and GM-VAE. We adopt an in-attention mechanism for the model and improve it by allowing modulation by conditional information. And we show the music generation model could control the generation of music according to the emotions specified by users in terms of high level linguistic expression and by manipulating their corresponding low-level musical attributes. Finally, we evaluate the model performance using a pre-trained emotion classifier against a pop piano midi dataset called EMOPIA and by subjective listening evaluation we demonstrate that the model could generate music with more refined emotions correctly.
Abstract (Chinese) I
Abstract II
Contents III
1 Introduction 1
1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Conditional symbolic music generation . . . . . . . . . . . . 5
1.1.2 Emotion-conditioned Symbolic Music Generation . . . . . . 6
1.1.3 Relationship between low level feature and arousal and valence 8
2 Methodology 9
2.1 Music Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Variational Autoencoders (VAE) . . . . . . . . . . . . . . . . . . . 10
2.3 Gaussian Mixture Variational Autoencoder (GM-VAE) . . . . . . . 12
2.4 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Experiment 19
3.1 Datasets and Model Hyper parameters . . . . . . . . . . . . . . . . 19
3.2 Low Level Music Attributes . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Music Generation with Specific Emotion . . . . . . . . . . . . . . . 21
III
3.4 Result and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Conclusion and Future work 28
Bibliography 29
A. Questionaire Design 32
B. Music Samples 33
[1] G. Brunner, A. Konrad, Y. Wang, and R. Wattenhofer. Midi-vae: Modeling
dynamics and instrumentation of music with applications to style transfer.
arXiv preprint arXiv:1809.07600, 2018.
[2] P. Gomez and B. Danuser. Relationships between musical structure and psy￾chophysiological measures of emotion. Emotion, 7(2):377, 2007.
[3] J. Grekow. From content-based music emotion recognition to emotion maps
of musical pieces, volume 747. Springer, 2018.
[4] J. Grekow and T. Dimitrova-Grekow. Monophonic music generation with
a given emotion using conditional variational autoencoder. IEEE Access,
9:129088–129101, 2021.
[5] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mo￾hamed, and A. Lerchner. beta-vae: Learning basic visual concepts with a
constrained variational framework. 2016.
[6] W.-Y. Hsiao, J.-Y. Liu, Y.-C. Yeh, and Y.-H. Yang. Compound word trans￾former: Learning to compose full-song music over dynamic directed hyper￾graphs. arXiv preprint arXiv:2101.02402, 2021.
[7] W.-N. Hsu, Y. Zhang, R. J. Weiss, H. Zen, Y. Wu, Y. Wang, Y. Cao, Y. Jia,
Z. Chen, J. Shen, et al. Hierarchical generative modeling for controllable
speech synthesis. arXiv preprint arXiv:1810.07217, 2018.
[8] Y.-S. Huang and Y.-H. Yang. Pop music transformer: Beat-based modeling
and generation of expressive pop piano compositions, 2020.
[9] H.-T. Hung, J. Ching, S. Doh, N. Kim, J. Nam, and Y.-H. Yang. Emopia:
A multi-modal pop piano dataset for emotion recognition and emotion-based
music generation. arXiv preprint arXiv:2108.01374, 2021.
[10] Z. Jiang, Y. Zheng, H. Tan, B. Tang, and H. Zhou. Variational deep embed￾ding: An unsupervised and generative approach to clustering, 2016.
[11] L. Kawai, P. Esling, and T. Harada. Attributes-aware deep music transforma￾tion. In Proceedings of the 21st international society for music information
retrieval conference, ismir, 2020.
[12] N. S. Keskar, B. McCann, L. R. Varshney, C. Xiong, and R. Socher. Ctrl:
A conditional transformer language model for controllable generation. arXiv
preprint arXiv:1909.05858, 2019.
[13] D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv
preprint arXiv:1312.6114, 2013.
[14] D. Makris, K. R. Agres, and D. Herremans. Generating lead sheets with
affect: A novel conditional seq2seq framework. In 2021 International Joint
Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021.
[15] F. C. A. E. F. S. N. G. Nathan Fradet, Jean-Pierre Briot. Miditok: A python
package for midi file tokenization. In Extended Abstracts for the Late-Breaking
Demo Session of the 22nd International Society for Music Information Re￾trieval Conference, 2021.
[16] A. Pati and A. Lerch. Latent space regularization for explicit control of
musical attributes. In ICML Machine Learning for Music Discovery Workshop
(ML4MD), Extended Abstract, Long Beach, CA, USA, 2019.
[17] J. A. Russell. A circumplex model of affect. Journal of personality and social
psychology, 39(6):1161, 1980.
[18] K. R. Scherer. What are emotions? and how can they be measured? Social
science information, 44(4):695–729, 2005.
[19] K. Sohn, H. Lee, and X. Yan. Learning structured output representation
using deep conditional generative models. Advances in neural information
processing systems, 28, 2015.
[20] H. H. Tan and D. Herremans. Music fadernets: Controllable music generation
based on high-level features via low-level feature modelling. In Proc. of the
International Society for Music Information Retrieval Conference, 2020.
[21] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
L. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural
information processing systems, 30, 2017.
[22] S.-L. Wu and Y.-H. Yang. MuseMorphose: Full-song and fine-grained music
style transfer with one Transformer VAE. arXiv preprint arXiv:2105.04090,
2021.
[23] L.-C. Yang, S.-Y. Chou, and Y.-H. Yang. Midinet: A convolutional generative
adversarial network for symbolic-domain music generation. arXiv preprint
arXiv:1703.10847, 2017.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *