帳號:guest(3.133.116.55)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):林玟妤
作者(外文):Lin, Wen Yu
論文名稱(中文):基於高階屬性之社群媒體用戶興趣探勘技術
論文名稱(外文):High-Level Attributes Based User Interests Mining from Social Media
指導教授(中文):林嘉文
指導教授(外文):Lin,Chia Wen
口試委員(中文):陳煥宗
黃朝宗
口試委員(外文):Chen,Hwann Tzong
Huang , Chao Tsung
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:101061600
出版年(民國):105
畢業學年度:104
語文別:中文
論文頁數:53
中文關鍵詞:主題模型興趣發現社群網站分析高階屬性
外文關鍵詞:Topic modelinterest miningsocial media analysishigh-level attributes
相關次數:
  • 推薦推薦:0
  • 點閱點閱:480
  • 評分評分:*****
  • 下載下載:24
  • 收藏收藏:0
本篇論文基於高階屬性之社群媒體使用者興趣探勘技術,利用使用者在社群專頁所分享的圖文內容,結合其文字特徵和影像的高階屬性特徵來找到使用者的興趣分布,其分佈可用於對使用者做個人化的廣告推薦,或是社群內容的推薦。論文架構包含了三個部分:圖文內容的預處理與特徵提取、帶有標籤的主題空間訓練和使用者興趣發現。
本研究選取帶有主題標籤的Facebook粉絲專頁的圖文內容作為訓練資料,共有兩個資料庫,分別為共選取10個主題、每個主題20個粉絲專頁,以及每個主題30個粉絲專頁。首先將文字內容做切詞、去除停止詞、抽取關鍵字;影像內容經過densecap擷取字幕,並對字幕做切詞、去除停止詞、抽取關鍵字,取出具代表意義的高階屬性。每個粉絲專頁經過LDA模型會得到一個文字部分的主題分佈和一個影像部分的主題分佈。接著找到主題分佈的數值最高的維度。屬於同一個主題標籤的粉絲專頁進行投票,選出得票數最高的維度,把這些粉絲專頁對應的標籤賦給該維度。然後再判斷是否每個維度都得到了唯一的標籤,如果否,就要調整LDA的超參數,再次進行訓練,直到每個維度都得到唯一標籤。
最後應用到普通使用者的社群網站資料上,只要將使用者所分享的圖文內容放到訓練好的主題模型裡作處理,就可以得到使用者的興趣分佈,其中每個維度的數值代表使用者對某某興趣的喜好程度。即可透過此方法分析使用者的興趣分佈。經實驗結果證明此改進的帶有標籤的LDA主題模型架構在實際應用中的可行性。
本研究的貢獻主要有以下三點:可有效地解決傳統非監督式LDA無法建構具體主題空間的問題;使用denscap擷取出來的影像高階屬性,使影像分析結果更具可靠性;結合文字與影像的結果,取長補短,充分利用了多媒體(Muliti-Media)的優勢。
關鍵字:主題模型;興趣發現;社群網站分析;高階屬性。
This paper is High-Level Attributes Based User Interests Mining from Social Media. The use of the user community pages shared by graphic content. This thesis presents a method jointly using textual and High-Level Attributes feature of user generated social media data to mine user interest distribution, which can be applied to the general user’s posts helping to mine his interest distribution. The mined distribution can serve for personalized ads recommendation or social content recommendation.
The framework consists of three steps: the preprocessing and feature extraction step, the Labeled Topic Space learning step, and the user interests mining step. The study chooses the Facebook fan pages which have topic labels as the training data. We were selected 10 topic, each 20 fanpages.
First, for the text posts, do text segmentation, and remove stop words, and extract keywords. After capturing image content by using densecap, get some caption, and to do text segmentation, and remove stop words, and extract keywords. And then we can get image's high-level attributes. Use LDA to check whether each dimension has a unique label.
Finally, applied to the general user community site data, as long as the user is sharing image into the model training good theme for processing, you can get the user's interest profile, wherein the representative value of each dimension user preferences for a certain degree of interest. Analysis of interest to the user through this distribution method.
The experimental results shows the improved effectiveness of the proposed method. Also, the image recommendation demonstration verifies the feasibility of our method applied on real data.
The main contribution of this study contains four parts: it can solve the problem that the conventional unsupervised LDA can’t reveal the specific meaning of each dimension of the topic space; Use denscap capture high-level attributes out of the image, to make the image more reliable analytical results; Combining text and image results, and learn from each other and make full use of multimedia advantage.

Keywords: Topic model; interest mining; social media analysis; high-level attributes;
摘 要 I
Abstract II
目錄 III
圖目錄 V
表目錄 VIII
第一章 緒論 1
1.1研究背景 1
1.2動機和目的 3
1.3論文架構 4
第二章 文獻探討 5
2.1 主題模型 5
2.2 多媒體主題探索 9
第三章 研究方法 13
3.1 研究方法概述 13
3.2 文字特徵提取 14
3.3 影像特徵提取 15
3.4 模型訓練 16
3.5 模型推論 20
3.6 影像文檔比對 20
第四章 實驗結果與討論 22
4.1 數據收集 22
4.2 機率分佈分析 24
4.3 文檔比對分析 29
4.4 主題最具影響關鍵字 42
4.5 比較 45
4.6 使用者興趣分佈 47
4.7 討論 48
第五章 結論 50
Reference 51
[1] J. Tang, R. Hong, S. Yan, T. Chua, G. Qi, R. Jain, Image annotation by k nn-sparse graph-based label propagation over noisily tagged web images, ACM Trans. Intell. Syst. Technol. (TIST) 2 (2011) 14.
[2] J. Tang, S. Yan, R. Hong, G. Qi, T. Chua, Inferring semantic concepts from community-contributed images and noisy 標籤, in: Proceedings of the MM, 2009, 223–232.
[3] J. Tang, Z. Zha, D. Tao, T. Chua, Semantic-gap-oriented active learning for multilabel image annotation, IEEE Trans. Image Process. 21 (2012) 2354–2360.
[4] H. Feng, X. Qian, Recommend social network users favorite brands, PCM (2013).
[5] X. Qian, X. Liu, C. Zheng, Y. Du, X. Hou, Tagging photos using users' vocabularies, Neurocomputing 111 (2013) 144–153.
[6] J. Weng, E.-P. Lim, J. Jiang, and Q. He. “Twitterrank: finding topic-sensitive influential twitterers”. In WSDM, 2010.
[7] M. Michelson and S.A. Macskassy. “Discovering users’ topics of interest on twitter: A first look”. In Proceedings of the Workshop on Analytics for noisy, Unstructured Text Data, 2010.
[8] J. Chen, R. Nairn, L. Nelson, M. Bernstein, and E. Chi. “Short and tweet: experiments on recommending content from information streams”. In CHI, 2010.
[9] Qiu, Feng, and Junghoo Cho. "Automatic identification of user interest for personalized search." Proceedings of the 15th international conference on World Wide Web. ACM, 2006.
[10] Wang, Xin-Jing, et al. "Argo: intelligent advertising by mining a user's interest from his photo collections." Proceedings of the Third International Workshop on Data Mining and Audience Intelligence for Advertising. ACM, 2009.
[11] Hofmann, T., ―Unsupervised learning by probabilistic Latent semantic analysis‖, Machine Learning, 42 (1), 2001, 177- 196.
[12] Deerwester, Scott C., et al. "Indexing by Latent semantic analysis." JAsIs 41.6 (1990): 391-407.
[13] Hofmann, Thomas. "Probabilistic Latent semantic indexing." Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999.
[14] Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research 3 (2003): 993-1022.
[15] Blei, David, and John Lafferty. "Correlated topic models." Advances in neural information processing systems 18 (2006): 147.
[16] Salton, G. and McGill, M. J. 1983 Introduction to modern information retrieval. McGraw-Hill, ISBN 0-07-054484-0.
[17] Papadimitriou, Christos H., and Kenneth Steiglitz. Combinatorial optimization: algorithms and complexity. Courier Corporation, 1998.
[18] Griffiths, D. M. B. T. L., and M. I. J. J. B. Tenenbaum. "Hierarchical topic models and the nested Chinese restaurant process." Advances in neural information processing systems 16 (2004): 17.
[19] Teh, Yee Whye, et al. "Hierarchical dirichlet processes." Journal of the american statistical association 101.476 (2006).
[20] Mimno, David, Wei Li, and Andrew McCallum. "Mixtures of hierarchical topics with pachinko allocation." Proceedings of the 24th international conference on Machine learning. ACM, 2007.
[21] Mcauliffe, Jon D., and David M. Blei. "Supervised topic models." Advances in neural information processing systems. 2008.
[22] Ramage, Daniel, et al. "Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora." Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, 2009.
[23] Lacoste-Julien, Simon, Fei Sha, and Michael I. Jordan. "DiscLDA: Discriminative learning for dimensionality reduction and classification."Advances in neural information processing systems. 2009.
[24] Ramage, Daniel, et al. "Clustering the tagged web." Proceedings of the Second ACM International Conference on Web Search and Data Mining. ACM, 2009.
[25] Petinot, Yves, Kathleen McKeown, and Kapil Thadani. "A hierarchical model of web summaries." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2. Association for Computational Linguistics, 2011.
[26] Perotte, Adler J., et al. "Hierarchically supervised Latent Dirichlet allocation."Advances in Neural Information Processing Systems. 2011.
[27] Newman, David, Chemudugunta, Chaitanya, Smyth, Padhraic, and Steyvers, Mark. Analyzing entities and topics in news articles using statistical topic models. Intelligence and Security Informatics, pp. 93–104, 2006.
[28] Liu, Yan, Niculescu-Mizil, Alexandru, and Gryc, Wojciech. Topic-link LDA: joint models of topic and author community. In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 665–672. ACM, 2009.
[29] Zhao, Wayne, Jiang, Jing, Weng, Jianshu, He, Jing, Lim, Ee-Peng, Yan, Hongfei, and Li, Xiaoming. Comparing Twitter and traditional media using topic models. Advances in Information Retrieval, pp. 338–349, 2011.
[30] Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." arXiv preprint arXiv:1511.07571 (2015).
[31] Tang, Jian, et al. "Understanding the limiting factors of topic modeling via posterior contraction analysis." Proceedings of The 31st International Conference on Machine Learning. 2014.
[32] Feng, He, and Xueming Qian. "Mining user-contributed photos for personalized product recommendation." Neurocomputing 129 (2014): 409-420.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *