作者(外文):Taylor, Jherez
論文名稱(外文):Detecting contextual hate speech code words within social media
指導教授(外文):CHEN, YI-SHIN
口試委員(外文):SOO, VON-WUN
外文關鍵詞:Hate speechNLPTwitterPageRank
相較於面對面人際互動,仇恨言論近年來在社群媒體中急速增長。過去研究多使用字詞黑名單或斷詞法偵測仇恨言論,然而社群媒體使用者不斷發明新字 詞,以暗號影射或代表所要攻擊的對象,導致字詞黑名單或是斷詞法並無法有 效發揮功用。本研究發展了一個圖像式方法,結合傳統的字詞間隔脈絡與句法 依賴脈絡,找出仇恨言論暗號中隱含的仇恨言論。本研究使用不同脈絡中的字詞使用模式,目的為辨認出仇恨言論暗號,擴展了仇恨言論詞彙,並改進分類結果之精確性.
While relatively rare in face–to–face interactions, social media platforms have recently seen an increase in the occurrence of hate speech discourse. Most methods rely on word blacklists and other text level features such as n-grams. While this approach is effective for flagging hate speech content, the discourse is not limited to a specific vocabulary as users are constantly adopting new terms. In this work we develop a graph based approach that incorporates conventional word window contexts along with syntactic dependency contexts in order to learn the hidden meaning of hate speech code words that have relatively unknown associations to hate speech. Our proposal utilizes the different types of contexts in which words are utilized with the goal being to identify new code words, thus expanding the hate speech lexicon and improving the accuracy of future classification systems.
Introduction 1
Related Work 4
Hate Speech and Context 8
3.1 Neural Embeddings and Context . . . . . . . . . . . . 9
3.2 Embedding Types . . . . . . . . . . . . . . . . . . . 12
Methodology 15
4.1 Data Collection and Embedding Creation . . . . . . . 17
4.2 Contextual Graph Expansion . . . . . . . . . . . . . 21
4.3 Contextual Codeword Search . . . . . . . . . . . . . 28
Experiment Results 33
5.1 Training Data . . . . . . . . . . . . . . . . . ... . 33
5.2 Experimental Setup . . . . . . . . . . . . . . . . . 34
5.3 Annotation Experiment . . . . . . . . . . . . . . . . 35
Conclusion and Future Work 44
