帳號:guest(3.133.113.72)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):程立維
作者(外文):Cheng, Li-Wei
論文名稱(中文):基於條件自注意力編碼器-解碼器模型的歌詞生成
論文名稱(外文):Lyrics Generation Based On A Conditional Self-Attention Encoder-Decoder Model
指導教授(中文):蘇豐文
指導教授(外文):Soo, Von-Wun
口試委員(中文):朱宏國
林豪鏘
口試委員(外文):ZHU, HONG-GUO
LIN, HAO-QIANG
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:109062543
出版年(民國):111
畢業學年度:111
語文別:中文
論文頁數:56
中文關鍵詞:機器學習歌詞創作人工智慧自注意力
外文關鍵詞:machinelearningAIlyricsgenerationself-attention
相關次數:
  • 推薦推薦:0
  • 點閱點閱:304
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
自動中文歌詞生成是一項具有挑戰性的自然語言生成 (NLG)任務。在本論文中,我們提出了一個深度學習框架,該框架應用基於注意力層的 Transformer 模型來處理任務。此外,我們引入了帶有歌詞信息的前綴控制代碼,用於指定生成歌詞的約束。然後我們針對各種控制代碼空間訓練我們的模型,這有助於模型學習控制輸出歌詞。依據研究結果,我們的模型生成具有給定限制的序列,在“長度匹配”的準確率為96\%,“韻律匹配”的準確率為97\%。為了提高歌詞的品質,我們通過使用波束搜索生成更多的候選歌詞。在波束搜索的幫助下,我們在BLEU、ROUGE-1、ROUGE-2和ROUGE-L指標中都明顯比基線模型更好。在選取候選歌詞時我們沒有選擇概率最高的歌詞,而是保留所有從波束搜索中獲取的歌詞,並將每句歌詞的概率作為進一步重新排序的分數。我們根據行級語義結構連貫性方法引入完整歌詞的連貫性,幫助我們選擇更好的候選者。因此,我們通過考慮每個序列之間的連貫關係來重新排列來自波束搜索的候選者,以生成保持所有歌詞之間的整體連貫性和句法的完整歌曲。我們將基線模型生成的歌詞、人類寫的歌詞以及模型生成的歌詞進行主觀評估。基線模型的平均分數為2.78而人類寫的歌詞為3.65,與他們相比我們的模型的平均得分為3.37。根據研究結果,我們的模型生成的歌詞更接近人類編寫的歌詞。
Automatic Chinese lyrics generation is a challenging natural language generation (NLG) task. In this thesis, we propose a deep learning framework that applies a Transformer model based on attention layers to deal with the task. Additionally, we introduce prefix control codes with lyric information that specifies the constraints on the generated lyrics. Then we train our model against various control codes which helps the model learn to control the output lyrics. Results show that our models generate the sequences with given constraints with the accuracy of "length match" by 96\% and "rhyme match" by 97\%. To improve performance, we propose to generate more candidate lyrics at each step by employing beam search. With the help of beam search, we achieve improvements over baseline models in BLEU, ROUGE-1, ROUGE-2, and ROUGE-L metrics. Instead of selecting the line with the highest probability, we keep all lines from the beam search and take the probability of each sequence as a score for further re-ranking. We introduce the coherence of the full lyrics in terms of a line-level semantic structure coherence method that helps us to choose the better candidates. We thus re-rank the candidates from beam search by considering the coherence relationship between each sequence to generate the full song that maintains the overall coherence and syntax among all lyrics. We conduct subjective evaluations on the model-generated lyrics against the baseline model and ground truth (human-written ones). Our model has an average score of 3.37 compared to the baseline's 2.78 and ground truth's 3.65. It turns out that the lyrics generated by our model are closer to those written by humans.
Abstract (Chinese) I
Abstract II
Contents III
1 Introduction 1
2 Related work 6
2.1 Natural Language Generation . . . . . . . . . . . . . . . . . . . . . 6
2.2 Chinese Poetry Generation . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Rap Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1 Positional Encoding . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.2 Self-attention . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.3 Multi-Head Attention . . . . . . . . . . . . . . . . . . . . . . 10
2.4.4 Feed Forward Layer . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.5 Encoder and Decoder . . . . . . . . . . . . . . . . . . . . . . 11
2.5 GPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Methodology 13
3.1 Overview of a Lyrics Generation System . . . . . . . . . . . . . . . 13
3.2 Beam Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
III
3.3 Music Structure of Popular Chinese Songs . . . . . . . . . . . . . . 15
3.4 Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 The hard and Soft Constraints . . . . . . . . . . . . . . . . . . . . . 18
3.6 Topic Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.7 Semantic Structure Coherence . . . . . . . . . . . . . . . . . . . . . 21
3.8 Re-ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Experiment 26
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 Pre-training . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.2 Fine-Tuning On Lyrics . . . . . . . . . . . . . . . . . . . . . 27
4.2 Evaluation Experiments . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.1 Objective Evaluation . . . . . . . . . . . . . . . . . . . . . . 27
4.2.2 Re-ranking Experiment . . . . . . . . . . . . . . . . . . . . . 29
4.2.3 Semantic Structure Coherence Experiment . . . . . . . . . . 31
4.2.4 Control Code Experiments . . . . . . . . . . . . . . . . . . . 32
4.2.5 Subjective Human Evaluation . . . . . . . . . . . . . . . . . 37
5 Conclusion 41
Bibliography 43
A Questionnaire 47
A.1 Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.2 Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
B Baseline Lyrics 48
C Lyrics by our Model 50
C.1 Lyrics generated by our model . . . . . . . . . . . . . . . . . . . . . 51
IV
C.2 Topic Coherence Lyrics Generate by Our Model . . . . . . . . . . . 54
D Human Written Lyrics 55
[1] Gabriele Barbieri, Fran ̧cois Pachet, Pierre Roy, and Mirko Degli Esposti.
Markov constraints for generating lyrics with style. In Ecai, volume 242,
pages 115–120, 2012.
[2] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet alloca-
tion. Journal of machine Learning research, 3(Jan):993–1022, 2003.
[3] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Ka-
plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry,
Amanda Askell, et al. Language models are few-shot learners. Advances in
neural information processing systems, 33:1877–1901, 2020.
[4] Yihao Chen and Alexander Lerch. Melody-conditioned lyrics generation with
seqgans. In 2020 IEEE International Symposium on Multimedia (ISM), pages
189–196. IEEE, 2020.
[5] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert:
Pre-training of deep bidirectional transformers for language understanding.
arXiv preprint arXiv:1810.04805, 2018.
[6] Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical neural story gen-
eration. arXiv preprint arXiv:1805.04833, 2018.
[7] Luciano Floridi and Massimo Chiriatti. Gpt-3: Its nature, scope, limits, and
consequences. Minds and Machines, 30(4):681–694, 2020.
43
[8] Jing He, Ming Zhou, and Long Jiang. Generating chinese classical poems with
statistical machine translation models. In Twenty-Sixth AAAI Conference on
Artificial Intelligence, 2012.
[9] Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious
case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019.
[10] Nitish Shirish Keskar, Bryan McCann, Lav R Varshney, Caiming Xiong, and
Richard Socher. Ctrl: A conditional transformer language model for control-
lable generation. arXiv preprint arXiv:1909.05858, 2019.
[11] Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries.
In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July
2004. Association for Computational Linguistics.
[12] Eric Malmi, Pyry Takala, Hannu Toivonen, Tapani Raiko, and Aristides Gio-
nis. Dopelearning: A computational approach to rap lyrics generation. In
Proceedings of the 22nd ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, pages 195–204, 2016.
[13] Ramesh Nallapati, Bowen Zhou, Caglar Gulcehre, Bing Xiang, et al. Abstrac-
tive text summarization using sequence-to-sequence rnns and beyond. arXiv
preprint arXiv:1602.06023, 2016.
[14] Hugo Gon ̧calo Oliveira, Raquel Herv ́as, Alberto D ́ıaz, and Pablo Gerv ́as.
Adapting a generic platform for poetry generation to produce spanish poems.
In ICCC, pages 63–71, 2014.
[15] Hugo R Gon ̧calo Oliveira, F Amilcar Cardoso, and Francisco C Pereira. Tra-
la-lyrics: An approach to generate text based on rhythm. In Proceedings of the
4th. International Joint Workshop on Computational Creativity. A. Cardoso
and G. Wiggins, 2007.
44
[16] Peter Potash, Alexey Romanov, and Anna Rumshisky. Ghostwriter: Using
an lstm for automatic rap lyric generation. In Proceedings of the 2015 Confer-
ence on Empirical Methods in Natural Language Processing, pages 1919–1924,
2015.
[17] Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using
siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
[18] Gerard Salton and Christopher Buckley. Term-weighting approaches in auto-
matic text retrieval. Information processing & management, 24(5):513–523,
1988.
[19] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you
need. Advances in neural information processing systems, 30, 2017.
[20] Jesse Vig. A multiscale visualization of attention in the transformer model. In
Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics: System Demonstrations, pages 37–42, Florence, Italy, July 2019.
Association for Computational Linguistics.
[21] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show
and tell: A neural image caption generator. In Proceedings of the IEEE
conference on computer vision and pattern recognition, pages 3156–3164, 2015.
[22] Zhe Wang, Wei He, Hua Wu, Haiyang Wu, Wei Li, Haifeng Wang, and Enhong
Chen. Chinese poetry generation with planning based neural network. arXiv
preprint arXiv:1610.09889, 2016.
[23] Kento Watanabe, Yuichiroh Matsubayashi, Satoru Fukayama, Masataka
Goto, Kentaro Inui, and Tomoyasu Nakano. A melody-conditioned lyrics
45
language model. In Proceedings of the 2018 Conference of the North Ameri-
can Chapter of the Association for Computational Linguistics: Human Lan-
guage Technologies, Volume 1 (Long Papers), pages 163–172, New Orleans,
Louisiana, June 2018. Association for Computational Linguistics.
[24] Zhen Xu, Bingquan Liu, Baoxun Wang, Cheng-Jie Sun, Xiaolong Wang,
Zhuoran Wang, and Chao Qi. Neural response generation via gan with an
approximate embedding layer. In Proceedings of the 2017 conference on em-
pirical methods in natural language processing, pages 617–626, 2017.
[25] Xingxing Zhang and Mirella Lapata. Chinese poetry generation with recur-
rent neural networks. In Proceedings of the 2014 Conference on Empirical
Methods in Natural Language Processing (EMNLP), pages 670–680, Doha,
Qatar, October 2014. Association for Computational Linguistics.
46
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *