基於文字探勘技術分析音樂串流平台排行榜歌詞群集之應用 - 以QQ音樂為例_

帳號：guest(216.73.216.146) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	曾威誠
作者(外文):	Tseng, Wei-Cheng
論文名稱(中文):	基於文字探勘技術分析音樂串流平台排行榜歌詞群集之應用 - 以QQ音樂為例
論文名稱(外文):	Application of Music Streaming Platform with Lyrics Clustering Based on Text Mining - Take QQ Music as an Example
指導教授(中文):	李金龍張延彰
指導教授(外文):	Li, Chin-Lung Chang, Yen-Chang
口試委員(中文):	洪文良沈冠甫
口試委員(外文):	Hung, Wen-Liang Shen, Kuan-Fu
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	計算與建模科學研究所
學號:	106026506
出版年(民國):	109
畢業學年度:	108
語文別:	中文
論文頁數:	37
中文關鍵詞:	文字探勘、集群分析、K-means、Cascade K-means
外文關鍵詞:	Text Mining、Cluster Analysis、K-means、Cascade K-means
相關次數:	推薦:0 點閱:384 評分: 下載:0 收藏:0

隨著網路科技的發展，聽音樂的方式也有所改變；而購買音樂方式也從實體唱片專輯到音樂串流平台聆聽、歌曲下載等方式，過去侷限於實體唱片內多半只有單一歌手的歌曲且攜帶不方便，選擇性與方便性較為不好，而現在音樂串流平台上擁有各式不同的歌手、語言與類型，只要拿出手機、電腦點選就可以切換不同的歌曲，選擇性與方便性都比起以往的操作認知具有明顯的提升。
本文將蒐集QQ音樂內地榜2018年度各週排名前50名歌曲的歌詞，使用文字探勘技術對歌詞進行斷詞、去除停用詞等文本預處理後，建立詞袋模型(bag-of-words model)以及TF-IDF文字向量化，使用群集分析將歌詞進行分群。最後，我們將分群資料與排行榜名次進行比對的方式，觀察QQ音樂用戶在2018年內地榜喜好的主題類型以及趨勢。

With the advance of technology, the way to listen to the music goes diversify. The listeners also changed their habits from buying CDs to download digital copies or go streaming the music online. In the past, most of CD contents published with single singer or group only and it’s not portable and convenient enough for the listeners. Now the streaming platforms provide listeners the whole world. Not only the singer, but also different genres and languages. All you need is take out your phone, and with just one click you can change any songs that you wanted. Music variety and convenience is more than ever before.
This research will collect the lyrics form 2018 “QQ Music Chinese Week Chart” and target top 50 songs from the chart. Lyrics from selected songs will be pre-processed with word segmentation that based on text mining and delete stopwords. After the pre-process, it will create bag-of-words model and TF-IDF word vector. Then, using lyrics clustering to separate different categories of the lyrics. Finally, we can compare the lyrics categories to the chart ranking to survey the user trending and music genres of QQ Music in 2018 “QQ Music Chinese Week Chart”.

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 vii
第一章緒論 1
1.1 研究背景與目的 1
1.2 研究範圍 2
1.3 資料來源 2
1.4 論文架構 3
第二章文獻回顧 4
2.1 文字探勘 4
2.2 集群分析 5
2.3 主題模型 6
第三章研究方法 8
3.1 研究架構 8
3.1.1 資料收集 9
3.1.1.1 音樂排行榜資料 9
3.1.1.2 歌詞資料 10
3.1.2 資料統整 11
3.1.2.1 中文斷詞 11
3.1.2.2 詞袋模型 12
3.1.3 分群方法 12
3.1.3.1 Cascade K-means 演算法 13
3.1.4 比對分析與圖表呈現 14
3.2 系統開發環境及工具 14
第四章實證結果與分析 15
4.1 分群前處理 15
4.2 歌詞分群 19
4.3 分群分析與結果 25
4.4 主題命名 29
4.5 比對分析 30
第五章結論與未來展望 33
5.1 結論 33
5.2 建議與未來展望 33
參考文獻 35

[1] 陳柏瑋（2019）。在PTT平台上比較以分群為主的議題偵測方法。淡江大學統計學系應用統計學碩士班碩士論文，新北市。
[2] 蔡佳芳（2019）。運用協同過濾技術於最佳型顧客之個人化書籍推薦。中國文化大學資訊管理學系碩士在職專班碩士論文，台北市。
[3] 駱昱岑（2019）。基於文本分析方法探討流行歌曲情緒辨識之研究。國立政治大學統計學系碩士論文，台北市。
[4] 林子敬（2018）。基於主題目模型的用戶分群應用。淡江大學統計學系應用統計學碩士班碩士論文，新北市。
[5] 廖偉帆（2016）。熱門華語流行音樂歌詞情緒分析與趨勢發展。實踐大學資訊科技與管理學系碩士班碩士論文，台北市。
[6] 周昀萱（2013）。國語流行歌詞語義的性別分析。國立成功大學教育研究所碩士論文，台南市。
[7] 黃袖雯（2013）。愛「情歌」．「愛情」歌――台灣國語流行歌曲(1980~2013)之愛情書寫研究。國立屏東教育大學中國語文學系碩士班碩士論文，屏東縣。
[8] 吳振銘（2011）。應用改良式K-means分群法於個人化音樂推薦服務系統之實現。國立高雄應用科技大學電子工程系碩士論文，高雄市。
[9] 曾湘雲（2004）。檢視台灣流行音樂市場結構與產品多樣性之關聯性：從歌曲內容及音樂產製面談起。國立交通大學傳播研究所碩士論文，新竹市。
[10] 廖述賢，朱佩慧（2019）。以文字探勘與書目分析法探討資料探勘技術的發展與應用。德霖學報。第32期。197-220
[11] 陳世榮（2015）。社會科學研究中的文字探勘應用：以文意為基礎的文件分類以及問題。人文及社會科學集刊。第二十七卷第四期。683-718
[12] 尹其言，楊建民（2010）。應用文件分群與文字探勘技術於機器學習領域趨勢分析
以SSCI資料庫為例。長榮大學學報。14(2)。1-16。
[13] 楊德倫（2014）。文字探勘之前處理與TF-IDF介紹。國立臺灣大學計算機及資訊網路中心電子報。第0031期。
[14] 羅凱揚，蘇宇暉（2019）。資料探勘與文字探勘之比較。西元2019年9月3日，取自https://medium.com/marketingdatascience/%E8%B3%87%E6%96%99%E6%8E%A2%E5%8B%98%E8%88%87%E6%96%87%E5%AD%97%E6%8E%A2%E5%8B%98%E4%B9%8B%E6%AF%94%E8%BC%83-4410964ded2e
[15] 丁一賢，陳牧言（2005）。資料探勘。台中。滄海書局
[16] MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, 1(14) , 281-297.
[17] Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-Theory and Methods, 3(1), 1-27.
[18] Papadimitriou, C. H., Tamaki, H., Raghavan, P., & Vempala, S. (1998, May). Latent semantic indexing: A probabilistic analysis. In Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems (pp. 159-168). ACM.
[19] Hofmann, T. (1999). Probabilistic latent semantic indexing. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 1999), Berkeley, California, USA.
[20] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022. doi:10.1016/j.linged.2018.05.003
[21] D. Arthur and S. Vassilvitskii, (2007) "k-means++: The advantages of careful seeding," presented at the Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms.

(此全文限內部瀏覽)
電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文