帳號:guest(18.218.17.45)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):李蒂卡
作者(外文):Ritika Nimje
論文名稱(中文):Pronoun Resolution in Fiction Stories Based on Background Information and Discourse Structure
論文名稱(外文):基於背景資訊與篇章結構來解析小說故事中的代名詞
指導教授(中文):蘇豐文
指導教授(外文):Soo,Von-Wun
口試委員(中文):陳宜欣
陳朝欽
口試委員(外文):Chen,Yi Shin
Chen,Chaur-Chin
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:103062421
出版年(民國):105
畢業學年度:105
語文別:英文中文
論文頁數:80
中文關鍵詞:背景資訊 故事代名詞
外文關鍵詞:pronoun resolutionfictional storiesbackground information
相關次數:
  • 推薦推薦:0
  • 點閱點閱:331
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
代名詞解析是一個篇章分析中常見的工作也是自然語言處理應用上的一個重要的研究。我們藉由故事中人物的背景語意資訊來對付文章內容中代名詞的解析問題。我們也從故事中擷取有關對話中的講者的篇章規則已將故事的文章分解成不同聚叢。更精確地我們著注於出現在文章中可以共同指涉的名詞片語來挑戰如何將語境中提及的代名詞與相關的關係來解析所提及的人物。我們利用暨有的史丹福剖析器的交叉指設法技術來擷取元件、名詞片語與共同指涉的候選人。因為史丹福剖析器對代名詞的解析只能達到百分之二十一到三十五的正確率。我們提議以個增進的方法假設我們可以提供些字我加註的數據與有關人物名詞片語的背景資訊(包括人物的關係如父親、母親、女兒等)來解析代名詞。我們利用一些訣竅規則來裂解篇章結構成為許多片段並利用背景資訊來改進故事中代名詞解析的準確度與召回度。我們利用三個不同領域與風格的故事:大亨小傳 、身份的案例 哈里波特來實驗並獲的約百分之五十的準確度的改善。
Pronoun resolution is a well-known task in discourse analysis and is an important research issue in the applications of natural language processing. We tackle the problem of pronoun resolution in the textual content by leveraging background semantic information of the characters in the story and the discourse structure. We also extracted some general discourse rules from the story about the speaker of the dialogs to split the story text into clusters. Background information includes, the relationship between the main characters in the story. We extracted some general discourse rules from the story about the narrator and the speakers of dialogs to split the story text into clusters. Specifically, we focus on noun phrases that co-reference identifiable entities that appear in the text; the challenge in this context is to improve the pronoun co-reference resolution by leveraging the potential relations by which we can identify the mentions. Our system applies state-of-the-art techniques to extract entities, noun phrases, and candidate co-references that are conducted by the Stanford Parser’s co-reference resolution method. Since Stanford parser’s co-reference resolution can account for only about 21% to 35%~20.9%, ~-27.985% and ~34.98% accuracy of pronoun resolution we need, we propose an augmented approach in which we assume we could provide priorly and self (manually) annotated data (which is about 10% of a full text) to Stanford parser and utilize the semantic relatedness of noun phrases to the background information about the characters (it included the person relationship like “father”, “mother”, “daughter”, etc. about the characters) to resolve the co-references. We employ heuristic rules of splitting text into segments based on the discourse structure as well as the background information to improve the recall and precision of pronoun resolution in stories. We use three stories with different domains and different writing styles, we used “A Great Gatsby”, “A Case of Identity” and “Harry potter” in our experiments and got, there was about ~50% of improvement in precision after applying ourt methods.
Table of Contents
Chapter1: 1
Introduction 1
1.1 The Task 2
1.1.1 General overview 2
1.1.2 Pronoun tackled 3
1.1.3 Aims of the system 4
1.2 Terminology 4
1.3 Related Work 8
1.3.1 Named Entity Recognition 8
1.3.2 Entity linking 8
1.3.3 Entity types 9
1.3.4 Co-reference and Anaphora 9
1.4 Outline 11
Chapter 2: Methodology 12
2.1 Method 1: Adding self-annotated data to the state-of-art method 12
2.1.1 System Input 12
2.1.2 System Overview 13
2.1.3 Preprocessing: 15
2.1.4 Adding Self-Annotated data to the clusters: 16
2.2 Method 2: Semantic Relatedness 17
2.2.1 Stanford Co-Reference Cluster Breaking 17
2.2.2 Semantic Annotation: 17
Table 2 shows the character relation, Column A and Column C shows the character name and Column B shows the relation with character of Column A with Column C. 18
2.2.3 Computing Semantic Annotation for Short Sentences 18
2.2.3.2 Text Similarity method 19
2.2.3.6 Overall Sentence Similarity 28
2.3 Method 3: Splitting the text using discourse structure 28
2.3.1 Text splitting 28
2.3.1 Procedure for Text splitting 31
2.3.2 Text Combination: 33
2.4 Method 4: Speaker Detection 34
2.4.1 Reflexive and Possessive pronoun resolution 38
2.5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser 39
2.6 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations: 40
2.7 Method 7: Adding self-annotated data to result text splitting result 41
Chapter 3: Experiment Evaluation: 42
3.1 Self-Annotated Data Set 42
3.2 Metrics 42
3.3 Case 1: A Case of Identity 44
3.3.1 Pronoun Resolution using state-of-art 45
3.3.2 Method 1: Adding self-annotated data to the state-of-art method 45
3.3.3 Method 2: Semantic Relatedness 46
3.3.4 Method 3: Splitting the text using discourse structure 47
3.3.5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser 47
3.3.6 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations: 48
3.3.7 Method 7: Adding self-annotated data to result text splitting result 49
3.4 Case 2: A Great Gatsby 49
3.4.1 Pronoun Resolution using state-of-art 49
3.4.2 Method 1: Adding self-annotated data to the state-of-art method 50
3.4.3 Method 2: Semantic Relatedness 50
3.4.4 Method 3: Splitting the text using discourse structure 51
3.4.5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser 52
3.4.6 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations: 52
3.4.7 Method 7: Adding self-annotated data to result text splitting result 53
3.5 Case 3: Harry Potter 53
3.5.1 Pronoun Resolution using state-of-art 53
3.5.2 Method 1: Adding self-annotated data to the state-of-art method 54
3.5.3 Method 2: Semantic Relatedness 55
3.5.4 Method 3: Splitting the text using discourse structure 55
3.5.5 Method 4: Speaker Detection 55
3.5.6 Method 5: Combining the Personal Pronoun Resolved with Stanford parser 55
3.5.7 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations: 56
3.5.8 Method 7: Adding self-annotated data to result text splitting result 56
3.5 Error Analysis 56
Chapter 4: Discussion 57
Chapter 6: Conclusion: 59
References: 61
Appendix 66
Appendix A: Original Text Format: 66
Appendix B: Dialog Clusters 68
Appendix C: Dialogs (conversation or narrative conversation of a speaker) 69
Dialog with one- to-one Conversation 70
Dialog with itself 70
Appendix D: Pronoun Resolver Using Splitting of Text 71
Appendix E: Stanford Resolver 71
Chapter 1: apter1: 11
Introduction 11
1.1 The Task 22
1.1.1 General overview 22
1.1.2 Pronoun tackled 33
1.1.3 Aims of the system 44
1.2 Terminology 44
1.3 Related Work 88
1.3.1 Named Entity Recognition 109
1.3.2 Entity linking 109
1.3.3 Entity types 119
1.3.4 Co-reference and Anaphora 1110
1.4 Outline 1211
Chapter 2: Methodology 1212
2.1 Method 1: Adding self-annotated data to the state-of-art method 1212
2.1.1 System Input 1212
2.1.2 System Overview 1313
2.1.3 Preprocessing: 1515
2.1.4 Adding Self-Annotated data to the clusters: 1616
2.2 Method 2: Semantic Relatedness 1717
2.2.1 Stanford Co-Reference Cluster Breaking 1717
2.2.2 Semantic Annotation: 1717
2.2.3 Computing Semantic Annotation for Short Sentences 1818
2.2.3.2 Text Similarity method 1919
2.2.3.6 Overall Sentence Similarity 2828
2.3 Method 3: Splitting the text using discourse structure 2828
2.3.1 Text splitting 2828
2.3.1 Procedure for Text splitting 3131
2.3.2 Text Combination: 3333
2.4 Method 4: Speaker Detection 3434
2.4.1 Reflexive and Possessive pronoun resolution 3838
2.5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser 3939
2.6 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations: 4040
2.7 Method 7: Adding self-annotated data to result text splitting result 4141
2.8 Method 8: Adding self-annotated data to result of method 5 4141
Chapter 3: Experiment Evaluation: 4343
3.1 Self-Annotated Data Set 4343
3.2 Metrics 4343
3.3 Case 1: A Case of Identity 4545
3.3.1 Pronoun Resolution using state-of-art 4545
3.3.2 Method 1: Adding self-annotated data to the state-of-art method 4646
3.3.3 Method 2: Semantic Relatedness 4747
3.3.4 Method 3: Splitting the text using discourse structure 4848
3.3.5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser 4848
3.3.6 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations: 4949
3.3.7 Method 7: Adding self-annotated data to result text splitting result 5050
3.3.8 Method 8: Adding self-annotated data to result of method 5 5050
3.4 Case 2: A Great Gatsby 5151
3.4.1 Pronoun Resolution using state-of-art 5252
3.4.2 Method 1: Adding self-annotated data to the state-of-art method 5252
3.4.3 Method 2: Semantic Relatedness 5353
3.4.4 Method 3: Splitting the text using discourse structure 5353
3.4.5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser 5454
3.4.6 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations: 5555
3.4.7 Method 7: Adding self-annotated data to result text splitting result 5555
3.4.8 Method 8: Adding self-annotated data to result of method 5 5656
3.5 Case 3: Harry Potter 5757
3.5.1 Pronoun Resolution using state-of-art 5757
3.5.2 Method 1: Adding self-annotated data to the state-of-art method 5858
3.5.3 Method 2: Semantic Relatedness 5959
3.5.4 Method 3: Splitting the text using discourse structure 5959
3.5.5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser 6060
3.5.7 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations: 6161
3.5.8 Method 7: Adding self-annotated data to result text splitting result 6161
3.3.8 Method 8: Adding self-annotated data to result of method 5 6262
3.6 Error Analysis 6364
Chapter 4: Discussion 6566
Chapter 56: Conclusion: 7069
References 7170
Appendix 7675
Appendix A: Original Text Format 7675
Appendix B: Dialog Clusters 7776
Appendix C: Dialogs (conversation or narrative conversation of a speaker) 7978
Dialog with one- to-one Conversation 7978
Dialog with itself 7978
Appendix D: Pronoun Resolver Using Splitting of Text 8079
Appendix E: Stanford Resolver 8180

1. Mira Ariel. Accessing noun-phrase antecedents. Routledge, 2014.
2. Amit Bagga and Breck Baldwin. Algorithms for scoring co-reference chains. In The first international conference on language resources and evaluation workshop on linguistics coreference, volume 1, pages 563–566. Citeseer, 1998.
3. Breck Baldwin. Cogniac: high precision co-reference with limited knowledge and linguistic resources. In Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts, pages 38–45. Association for Computational Linguistics, 1997.
4. Andrew Borthwick, John Sterling, Eugene Agichtein, and Ralph Grishman. Sixth Workshop on Very Large Corpora, chapter Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition. 1998.
5. Volha Bryl, Claudio Giuliano, Luciano Serafini, and Kateryna Tymoshenko. Using background knowledge to support co-reference resolution. In ECAI, volume 10, pages 759–764, 2010.
6. Xiao Cheng and Dan Roth. Relational Inference for Wikification. Empirical Methods in Natural Language Processing, pages 1787–1796, 2013.
7. Pradheep Elango. Co-reference resolution: A survey. Technical report, University of Wisconsin, Madison, 2005.
1. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, pages 363–370, Stroudsburg, PA, USA, 2005.
8.
9. Aldo Gangemi, Andrea Giovanni Nuzzolese, Valentina Presutti, Francesco Draicchio, Alberto Musetti, and Paolo Ciancarini. Automatic typing of DBpedia entities. In The Semantic Web–ISWC 2012, pages 65–81. Springer, 2012.
10. Niyu Ge, John Hale, and Eugene Charniak. A statistical approach to anaphora resolution. In Proceedings of the sixth workshop on very large corpora, volume 71, 1998.
11. Barbara J Grosz et al. The representation and use of focus in a system for understanding dialogs. In IJCAI, volume 67, page 76, 1977.
12. Barbara J Grosz, Scott Weinstein, and Aravind K Joshi. Centering: A framework for modeling the local coherence of discourse. Computational linguistics, 21(2):203– 225, 1995.
13. Aria Haghighi and Dan Klein. Simple co-reference resolution with rich syntactic and semantic features. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP ’09, pages 1152–1161, Stroudsburg, PA, USA, 2009.
14. Michael A.K. Halliday and Ruqaiya Hasan. Cohesion in English. Longman, London, 1976.
15. Sanda M Harabagiu, Rzvan C Bunescu, and Steven J Maiorano. Text and knowledge mining for co-reference resolution. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies, pages 1–8, 2001.
16. 16. J Hobbs. Resolving pronoun references. In Readings in natural language processing, pages 339–352. Morgan Kaufmann Publishers Inc., 1986.
17. Neil Houlsby and Massimiliano Ciaramita. A scalable Gibbs sampler for probabilistic entity linking. In Advances in Information Retrieval, pages 335–346. Springer, 2014.
18. Shalom Lappin and Herbert J Leass. An algorithm for pronominal anaphora resolution. Computational linguistics, 20(4):535–561, 1994.
19. Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. Stanford’s multi-pass sieve co-reference resolution system at the conll-2011 shared task. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, pages 28–34, 2011.
20. Xiaoqiang Luo. On co-reference resolution performance metrics. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 25–32, 2005
21. Edgar Meij, Krisztian Balog, and Daan Odijk. Entity linking and retrieval. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’13, pages 1127–1127, New York, NY, USA, 2013. ACM.
22. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111–3119, 2013.
23. Ruslan Mitkov. Robust pronoun resolution with limited knowledge. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 2, pages 869– 875, 1998.
24. Ndapandula Nakashole, Tomasz Tylenda, and Gerhard Weikum. Fine-grained semantic typing of emerging entities. In ACL (1), pages 1488–1497, 2013.
25. Vincent Ng. Machine learning for co-reference resolution: Recent successes and future challenges. Technical report, Cornell University, 2003.
26. Vincent Ng. Semantic class induction and co-reference resolution. In Association of Computational Linguistics, pages 536–543, 2007.
27. Vincent Ng. Supervised noun phrase co-reference research: The first fifteen years. In Proceedings of the 48th annual meeting of the association for computational linguistics, pages 1396–1411, 2010.
28. Heiko Paulheim and Christian Bizer. Improving the Quality of Linked Data Using Statistical Distributions. Int. J. Semantic Web Inf. Syst., 10(2):63–86, January 2014.
29. Simone Paolo Ponzetto and Michael Strube. Exploiting semantic role labeling, wordnet and wikipedia for co-reference resolution. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 192–199, 2006.
30. Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and Yuchen Zhang. Conll-2012 shared task: Modeling multilingual unrestricted co-reference in ontonotes. In Joint Conference on EMNLP and CoNLL - Shared Task, CoNLL ’12, pages 1–40, Stroudsburg, PA, USA, 2012.
31. William M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846–850, 1971.
32. M. Recasens and E. Hovy. Blanc: Implementing the rand index for co-reference evaluation. Nat. Lang. Eng., 17(4):485–510, October 2011.
33. Giuseppe Rizzo and Troncy Rapha¨el. NERD: A Framework for Evaluating Named Entity Recognition Tools in the Web of Data. Proceedings of the 11th International Semantic Web Conference ISWC2011, pages 1–4, 2011.
34. Candace Sidner. Focusing in the comprehension of definite anaphora. In Readings in Natural Language Processing, pages 363–394. Morgan Kaufmann Publishers Inc., 1986.
35. Candace Lee Sidner. Towards a computational theory of definite anaphora comprehension in english discourse. Technical report, DTIC Document, 1979.
36. Wee Meng Soon, Hwee Tou Ng, and Daniel Chung Yong Lim. A machine learning approach to co-reference resolution of noun phrases. Computational linguistics, 27(4):521–544, 2001.
37. Michael Strube and Simone Paolo Ponzetto. Wikirelate! computing semantic relatedness using wikipedia. In AAAI, volume 6, pages 1419–1424, 2006.
38. Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudr´e-Mauroux, and Karl Aberer. Trank: Ranking entity types using the web of data. In The Semantic Web – ISWC 2013, volume 8218 of Lecture Notes in Computer Science, pages 640–656. Springer Berlin Heidelberg, 2013.
39. Tomasz Tylenda, Mauro Sozio, and Gerhard Weikum. Einstein: Physicist or vegetarian? summarizing semantic type graphs for knowledge discovery. In Proceedings of the 20th International Conference Companion on World Wide Web, WWW ’11, pages 273–276, New York, NY, USA, 2011. ACM.
40. Olga Uryupina, Massimo Poesio, Claudio Giuliano, and Kateryna Tymoshenko. Disambiguation and filtering methods in using web knowledge for co-reference resolution. In FLAIRS Conference, pages 317–322, 2011.
41. Kees Van Deemter and Rodger Kibble. On coreferring: Co-reference in muc and related annotation schemes. Computational linguistics, 26(4):629–637, 2000.
42. http://www.nltk.org/
43. Statistics Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett. Sentence Similarity Based on Semantic Nets and Corpus
44. Hobbs, J. R. Pronoun resolution (Research Report No. 76-1). Department of Computer Sciences, City College, City University of New York. 12, 1976.
45. Dagan, I., Justeson, J. S., Lappin, S., Leass, H. J., & Ribak, A. Syntax and lexical statistics in anaphora. Applied Artificial Intelligence, 9(6), 633-644, 1995, November)..
46. Kennedy, C., & Boguraev, B. Anaphora for everyone: pronominal anaphora resoluation without a parser. In Proceedings of the 16th conference on computational linguistics (pp. 113–118). Morristown, NJ, USA: Association for Computational Linguistics, 1996.
47. Cardie, C. Corpus-based acquisition of relative pronoun disambiguation heuristics. In Proceedings of the 30th annual meeting on association for computational linguistics (pp. 216– 223). Morristown, NJ, USA: Association for Computational Linguistics,1992.
48. Denber, M. Automatic resolution of anaphora in English (Tech. Rep.). Eastman Kodak Co.
49. Dagan, I., & Itai, A. (1990). Automatic processing of large corpora for the resolution of anaphora references. In Proceedings of the 13th conference on computational linguistics (pp. 330–332). Morristown, NJ, USA: Association for Computational Linguistics, 1998.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *