帳號:guest(18.117.232.239)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):高振倫
作者(外文):Kao, Chen Lun
論文名稱(中文):寫作風格相似性之度量
論文名稱(外文):Measurement on Writing Style Similarity
指導教授(中文):陳宜欣
指導教授(外文):Chen, Yi Shin
口試委員(中文):陳朝欽
蘇豐文
口試委員(外文):Chen, Chaur Chin
Soo, Von Wun
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:101065522
出版年(民國):105
畢業學年度:104
語文別:英文
論文頁數:58
中文關鍵詞:寫作風格社群網路假分身文字探勘
外文關鍵詞:writing stylesocial networksock puppettext mining
相關次數:
  • 推薦推薦:0
  • 點閱點閱:517
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
為辨識出社群網路中利用假意見來攻擊對手的匿名帳號,我們提出了一個度量寫作風格相似度的方法,以找出社群網路中帳號與帳號間的潛在關係及真實身分。我們從學術論文與多語網路論壇上的文字資料擷取出各種特徵並轉換為可代表使用者寫作風格的向量,將使用者成對比較後算出其相似度,最後採用基於信度的機制,利用使用者數量在各階段的擴增及寫作特徵的過濾來維持整體方法的精確度。實驗結果顯示了我們的方法在寫作特徵選擇上的正確性,以及在使用者數量擴增時精確度僅有微幅的降低。
In this paper, we introduced a measurement on writing style similarity to determine the potential relationships between anonymous users who are usually used to give fake opinions and attack on their opposites. By English academic writings and text data crawled from multi-languages online forum, our approach extracted various signatures and form representative vectors to describe the writing style of every user, and do pairwise comparison on their similarities.
Chinese Abstract ii
Abstract iii
Acknowledgement iv
List of Tables ix
List of Figures x

1 Introduction 1
2 Related Work 4
3 Writing Style Signatures 7
3.1 Content-free signatures............................ 8
3.2 Author-dependent signatures ......................... 9
4 Methodology 11
4.1 Data Preprocessing .............................. 12
4.2 Signature Vector Construction ........................ 14
4.2.1 Word Extraction ........................... 14
4.2.2 Token Vector ............................. 16
4.2.3 Structural Vector ........................... 17
4.2.4 Representative Vector Construction ................. 19
4.3 Similarity Measurement............................ 19
4.3.1 Measure Definition.......................... 21
4.3.2 Pairwise Comparison......................... 22
4.4 Top-k User Retrieval ............................. 24
4.4.1 Signature Combination........................ 25
4.4.2 User Expansion And Filter...................... 26
5 Experimental Evaluation 27
5.1 JVC Dataset.................................. 27
5.1.1 Experimental Setup.......................... 27
5.1.2 Result Validation........................... 29
5.2 PTT Dataset.................................. 34
5.2.1 Experimental Setup.......................... 35
5.2.2 Result Validation........................... 37
6 Conclusion And Future Work References
42
References 44
[1] S.ArgamonandM.Koppel.SystemicFunctionalApproachtoAutomatedAuthorship Analysis, A. 2013.
[2] J. Binongo. Who wrote the 15th book of Oz? An application of multivariate analysis to authorship attribution. Chance, pages 9–17, 2003.
[3] Z. Bu, Z. Xia, and J. Wang. A sock puppet detection algorithm on virtual spaces. Knowledge-Based Systems, 37:366–377, 2013.
[4] F. Chierichetti, R. Kumar, S. Pandey, and S. Vassilvitskii. Finding the jaccard me- dian. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 293–311. Society for Industrial and Applied Mathematics, 2010.
[5] S. Hiremath and M. Otari. Plagiarism Detection-Different Methods and Their Analy- sis: Review. 1(7):41–47, 2014.
[6] J. M. Hughes, N. J. Foti, D. C. Krakauer, and D. N. Rockmore. Quantitative pat- terns of stylistic influence in the evolution of literature. Proceedings of the National Academy of Sciences of the United States of America, 109(20):7682–6, May 2012.
[7] Journal of Vibration and Control. Journal of Vibration and Control. 2015.
[8] P. Juola. Authorship Attribution. Foundations and Trends in Information Retrieval, 1(3):233–334, 2007.
[9] S. Kim, H. Kim, T. Weninger, J. Han, and H. Kim. Authorship classification: a discriminative syntactic tree mining approach. Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 455–464, 2011.
[10] M. Koppel, J. Schler, and S. Argamon. Computational methods in authorship attri- bution. Journal of the American Society for information Science and Technology, 60(1):9–26, 2009.
[11] C. Martindale and D. McKenzie. On the utility of content analysis in author attribu- tion: The Federalist. Computers and the Humanities, (t 964):259–270, 1995.
[12] R. Morin. Scholar Invents Fan To Answer His Critics, 2003.
[13] F. Mosteller and D. L. Wallace. Inference in an Authorship Problem. Journal of the
American Statistical Association, 58(302):275–309, 1963.
[14] RetractionWatch.SAGEPublicationsbusts”peerreviewandcitationring,”60papers
retracted, 2014.
[15] B. Stein, N. Lipka, and P. Prettenhofer. Intrinsic plagiarism analysis. Language
Resources and Evaluation, 45(1):63–82, Jan. 2010.
[16] O. D. Vel. Mining e-mail authorship. Proc. Workshop on Text Mining, ACM Interna-
tional Conference on Knowledge Discovery and Data Mining, 2000.
[17] Z. Xia and Z. Bu. Community detection based on a semantic network. Knowledge- Based Systems, 26:30–39, Feb. 2012.
[18] R. Zheng, J. Li, H. Chen, and Z. Huang. A framework for authorship identification of online messages:Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology, 57(3):378–393, Feb. 2006.
[19] R. Zheng, Y. Qin, Z. Huang, and H. Chen. Authorship analysis in cybercrime inves- tigation. Intelligence and Security Informatics, pages 59–73, 2003.
(此全文未開放授權)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *