基於主題之仇恨語言分析__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.146) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	秦邁文
作者(外文):	Peignon, Melvyn
論文名稱(中文):	基於主題之仇恨語言分析
論文名稱(外文):	Sources Of Hate Speech: A Topic Analysis
指導教授(中文):	陳宜欣
指導教授(外文):	Chen, Yi-Shin
口試委員(中文):	蘇豐文陳朝欽
口試委員(外文):	Soo, Von-Wun Chen, Chaur-Chin
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊系統與應用研究所
學號:	105065433
出版年(民國):	106
畢業學年度:	105
語文別:	英文
論文頁數:	41
中文關鍵詞:	仇恨言詞、Twitter、社群偵測、網路分析、主題建模
外文關鍵詞:	Hate speech、Twitter、Community detection、Network analysis、Topic modelling
相關次數:	推薦:0 點閱:562 評分: 下載:0 收藏:0

對於仇恨言論的偵測，現階段通常透過仇恨字詞篩選及監督示機器學習法來達成，但成果仍然不彰。目前這些方法都需要透過人工標注資料或使用現有字典來進行偵測，在實作上相當困難。仇恨言論是源自於極度動態與極端主義的網路社群，這種動態變化使得高質量言論資料不易蒐集。不僅如此，有些富有仇恨言論知識的人自告奮勇檢舉這些類型的文章時，帳號往往會遭到封鎖。基於上述因素，導致在公開的網路社群上蒐集仇恨言論非常窒礙難行。

本研究中針對仇恨言論源頭進行分析，並深入理解各大仇恨言論社群背後的這些主題。基於此，我們提出了一套蒐集高品質資料的方法，並公開這些最新資料，而這些資料不僅止於Twitter社群，更囊括了許多外部資料。本方法在資料蒐集上由Twitter API及極端主義仇恨社群的爬蟲程式組成，並透過圖形理論配對及組合兩者的資訊來提昇精準性。

Hate speech detection is a difficult task and relies mostly on word filtering and supervised algorithms. For those methods to work they need labelled data and dictionaries. Further difficulty comes from the fact that hate speech relies on a responsive and active extremist community. Furthermore, accounts that are reported for hate speech are often banned, particularly if the individual doing the reporting has some knowledge about hate speech. In practice, this means that hateful communities and the data they produce on Twitter is ephemeral. These dynamics make it difficult to collect quality hate speech data.
In this paper, we are present an analysis of hate speech directly at its source with the goal being to provide a deeper understanding of the topics that drive these communities. Our methodology consists of the collection of data that is of a high quality, for the purposes of hate speech analysis. We additionally provide an up-to-date dataset of extremist communities as they exist, both inside and outside of Twitter. Our approach for collecting data relies on the utilization of Twitter API, crawlers that scraped extremist right-wing websites, a mapping allowing us to link the two communities and graph theory to have a better an accurate representation of hate speech communities.

摘要 i
Abstract ii
Acknowledgement iii
List of Tables vi
List of Figures vii
Introduction 1
Related Work 6
Methodology 9
Collection and Mapping of the data 10
Data analysis 16
Experiments 22
Experimental setup 22
Data collection results 22
Data analysis results 28
Conclusion 39
References 41

[1] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation.Journal of machine Learning research, 3(Jan):993–1022, 2003.
[2] Ulrik Brandes and Christian Pich. Centrality estimation in large networks. International Journal of Bifurcation and Chaos, 17(07):2303–2318, 2007.
[3] Pete Burnap and Matthew L Williams. Us and them: identifying cyber hate on twitter across multiple protected characteristics. EPJ Data Science, 5(1):11, 2016.
[4] Val Burris, Emery Smith, and Ann Strahm. White supremacist networks on the internet. Sociological focus, 33(2):215–235, 2000.
[5] Forbes. Europe fine companies for hate speech, 2017.
[6] Ian T Jolliffe. Principal component analysis and factor analysis. In Principal component analysis, pages 115–128. Springer, 1986.
[7] Thomas K Landauer and Susan T Dumais. A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological review, 104(2):211, 1997.
[8] Jugendschutz net. Löschung rechtswidriger hassbeiträge bei facebook,youtube und twitter, 2017.
[9] Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang.
Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web, pages 145–153. International World Wide
Web Conferences Steering Committee, 2016.
[10] John T. Nockleby. Hate speech, 2017.
[11] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
[12] Amir Razavi, Diana Inkpen, Sasha Uritsky, and Stan Matwin. Offensive language detection using multi-level classification. Advances in Artificial Intelligence, pages 16–27, 2010.
[13] Eldar Sadikov and Maria Montserrat Medina Martinez. Information propagation on twitter. CS322 Project Report, 2009.
[14] SPLC. Social media platforms’ anti-hate efforts inch ahead, 2017.
[15] Heidi Tworek. How germany is tackling hate speech, 2017.
[16] William Warner and Julia Hirschberg. Detecting hate speech on the world wide web. In Proceedings of the Second Workshop on Language in Social Media, pages 19–26. Association for Computational Linguistics, 2012.[17] Zeerak Waseem. Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In Proceedings of the 1st Workshop on Natural Language Processing and Computational Social Science, pages 138–142, 2016.
[18] Zeerak Waseem and Dirk Hovy. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of NAACL-HLT,pages 88–93, 2016.
[19] Zhi Xu and Sencun Zhu. Filtering offensive language in online communities using grammatical relations. In Proceedings of the Seventh Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, pages 1–10, 2010.

(此全文未開放授權)
電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文