帳號:guest(          離開系統
字體大小: 字級放大   字級縮小   預設字形  


作者(外文):Kao, Chen Lun
論文名稱(外文):Measurement on Writing Style Similarity
指導教授(外文):Chen, Yi Shin
口試委員(外文):Chen, Chaur Chin
Soo, Von Wun
外文關鍵詞:writing stylesocial networksock puppettext mining
  • 推薦推薦:0
  • 點閱點閱:517
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
In this paper, we introduced a measurement on writing style similarity to determine the potential relationships between anonymous users who are usually used to give fake opinions and attack on their opposites. By English academic writings and text data crawled from multi-languages online forum, our approach extracted various signatures and form representative vectors to describe the writing style of every user, and do pairwise comparison on their similarities.
Chinese Abstract ii
Abstract iii
Acknowledgement iv
List of Tables ix
List of Figures x

1 Introduction 1
2 Related Work 4
3 Writing Style Signatures 7
3.1 Content-free signatures............................ 8
3.2 Author-dependent signatures ......................... 9
4 Methodology 11
4.1 Data Preprocessing .............................. 12
4.2 Signature Vector Construction ........................ 14
4.2.1 Word Extraction ........................... 14
4.2.2 Token Vector ............................. 16
4.2.3 Structural Vector ........................... 17
4.2.4 Representative Vector Construction ................. 19
4.3 Similarity Measurement............................ 19
4.3.1 Measure Definition.......................... 21
4.3.2 Pairwise Comparison......................... 22
4.4 Top-k User Retrieval ............................. 24
4.4.1 Signature Combination........................ 25
4.4.2 User Expansion And Filter...................... 26
5 Experimental Evaluation 27
5.1 JVC Dataset.................................. 27
5.1.1 Experimental Setup.......................... 27
5.1.2 Result Validation........................... 29
5.2 PTT Dataset.................................. 34
5.2.1 Experimental Setup.......................... 35
5.2.2 Result Validation........................... 37
6 Conclusion And Future Work References
References 44
[1] S.ArgamonandM.Koppel.SystemicFunctionalApproachtoAutomatedAuthorship Analysis, A. 2013.
[2] J. Binongo. Who wrote the 15th book of Oz? An application of multivariate analysis to authorship attribution. Chance, pages 9–17, 2003.
[3] Z. Bu, Z. Xia, and J. Wang. A sock puppet detection algorithm on virtual spaces. Knowledge-Based Systems, 37:366–377, 2013.
[4] F. Chierichetti, R. Kumar, S. Pandey, and S. Vassilvitskii. Finding the jaccard me- dian. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 293–311. Society for Industrial and Applied Mathematics, 2010.
[5] S. Hiremath and M. Otari. Plagiarism Detection-Different Methods and Their Analy- sis: Review. 1(7):41–47, 2014.
[6] J. M. Hughes, N. J. Foti, D. C. Krakauer, and D. N. Rockmore. Quantitative pat- terns of stylistic influence in the evolution of literature. Proceedings of the National Academy of Sciences of the United States of America, 109(20):7682–6, May 2012.
[7] Journal of Vibration and Control. Journal of Vibration and Control. 2015.
[8] P. Juola. Authorship Attribution. Foundations and Trends in Information Retrieval, 1(3):233–334, 2007.
[9] S. Kim, H. Kim, T. Weninger, J. Han, and H. Kim. Authorship classification: a discriminative syntactic tree mining approach. Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 455–464, 2011.
[10] M. Koppel, J. Schler, and S. Argamon. Computational methods in authorship attri- bution. Journal of the American Society for information Science and Technology, 60(1):9–26, 2009.
[11] C. Martindale and D. McKenzie. On the utility of content analysis in author attribu- tion: The Federalist. Computers and the Humanities, (t 964):259–270, 1995.
[12] R. Morin. Scholar Invents Fan To Answer His Critics, 2003.
[13] F. Mosteller and D. L. Wallace. Inference in an Authorship Problem. Journal of the
American Statistical Association, 58(302):275–309, 1963.
[14] RetractionWatch.SAGEPublicationsbusts”peerreviewandcitationring,”60papers
retracted, 2014.
[15] B. Stein, N. Lipka, and P. Prettenhofer. Intrinsic plagiarism analysis. Language
Resources and Evaluation, 45(1):63–82, Jan. 2010.
[16] O. D. Vel. Mining e-mail authorship. Proc. Workshop on Text Mining, ACM Interna-
tional Conference on Knowledge Discovery and Data Mining, 2000.
[17] Z. Xia and Z. Bu. Community detection based on a semantic network. Knowledge- Based Systems, 26:30–39, Feb. 2012.
[18] R. Zheng, J. Li, H. Chen, and Z. Huang. A framework for authorship identification of online messages:Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology, 57(3):378–393, Feb. 2006.
[19] R. Zheng, Y. Qin, Z. Huang, and H. Chen. Authorship analysis in cybercrime inves- tigation. Intelligence and Security Informatics, pages 59–73, 2003.
第一頁 上一頁 下一頁 最後一頁 top
* *