學習於網路擷取專有名詞的別名__國立清華大學博碩士論文全文影像系統

帳號：guest(18.116.80.34) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	謝泓廷
作者(外文):	Hsieh, Hung-Ting
論文名稱(中文):	學習於網路擷取專有名詞的別名
論文名稱(外文):	Learning to Extract Aliases of Named Entities on the Web
指導教授(中文):	張俊盛
指導教授(外文):	Chang, Jason S.
口試委員(中文):	陳信希張嘉惠
口試委員(外文):	Chen, Hsin-Hsi Chang, Chia-Hui
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊系統與應用研究所
學號:	100065513
出版年(民國):	102
畢業學年度:	101
語文別:	英文
論文頁數:	60
中文關鍵詞:	關係抽取、別名辭典、專有名詞、網路語料庫、條件隨機域
外文關鍵詞:	Relation Extraction、Alias Lexicon、Named Entity、Web as Corpus、Conditional Random Field
相關次數:	推薦:0 點閱:559 評分: 下載:3 收藏:0

在文件中，很多別名可能指得是同一個專有名詞。由於別名在文件中很常見，辨認出專有名詞的別名的功能對於很多應用領域很重要，例如搜尋引擎(像是Google和Bing)或是智慧型對話系統(例如Siri)。在本篇論文中，我們提出一個自動從網路上找尋專有名詞別名的方法。我們的方法在訓練階段會自動產生可用的詞彙樣式來引導搜尋引擎回傳相關含有給定專有名詞之別名的片段。此外，我們將判斷給定專有名詞之別名的邊界視為一個序列標記的問題，並訓練一個機器學習的模組。在執行階段，我們首先藉由擴展給定專有名詞為一組新的查詢來引導搜尋引擎回傳含有相關別名的片段，然後利用訓練好的序列標記模組來判斷別名的邊界。最後我們提出一個系統雛形(AliasFinder)，它運用了上述的方法從網路上獲取別名。實驗結果顯示，提出的方法的表現顯著地優於基礎實驗(baseline)。總而言之，本研究提出一個有效找到給定專有名詞的別名的方法，並且被應用於很多專業領域。

A named entity (NE) can be referred to using many aliases in documents. Due to the prevalence of aliases, recognizing aliases of NE becomes an essential part for many applications such as Search Engine (e.g., Google, Bing) and Intelligent Dialog System (e.g., Siri). In this paper, we propose an approach for learning to extract aliases for a given NE on the Web automatically. The method involves generating applicable lexical patterns automatically so as to bias the search engine to return relevance documents containing aliases. Furthermore, we treat the process of identifying boundaries of aliases for a given NE as a sequence labeling problem and train a machine-learning model. At run-time, we bias the search engine to retrieve relevance snippets by transforming the given NE into a set of queries and then identify the boundaries of aliases with the trained model. We present a prototype, AliasFinder, which applies the method to find aliases from the Web. Experimental results show that the proposed method yields better performance than the baselines, provides an efficient way to find aliases of NEs.

摘要 i
Abstract ii
致謝辭 iii
List of Figures v
List of Tables vi
CHAPTER 1 INTRODUCTION 1
CHAPTER 2 RELATED WORKS 5
CHAPTER 3 METHOD 10
3.1 Problem Statement 10
3.2 Collect Training Data 12
3.3 Generate Lexical Patterns 13
3.4 Learn Alias Extraction 16
3.5 Run-Time Stage 23
CHAPTER 4 EXPERIMENTAL SETTING 26
4.1 Training AliasFinder 26
4.2 Systems Compared 28
4.3 Evaluation Metrics 30
4.4 Evaluation NEs and Relevance Judgments 32
CHAPTER 5 EVALUATION RESULTS 36
5.1 Hit Rate and Coverage 36
5.2 Mean Reciprocal Rank (MRR) 39
5.3 Influence of Training Sizes 40
5.4 Compare against Search Engines 42
CHAPTER 6 FUTURE WORKS AND SUMMARY 45

Anwar, T., Abulaish, M. and Alghathbar, K. 2011. Web Content Mining for Alias Identification: a first step towards suspect tracking In Proceedings of International Conference on Intelligence and Security Informatics (ISI), pages: 195 – 197

Bollegala, D., Matsuo, Y. and Ishizuka, I. 2011. Automatic discovery of personal name aliases from the Web, Journal of IEEE Transaction on knowledge and data engineering, Volume. 23, no. 6, June 2011, pages: 831 – 844

Berland, M., Charniak, E. 1999. Finding parts in very large corpora. In Proceedings of 27th Annual Meeting of the Association for Computational Linguistics (ACL), page: 99 – 108

Bunescu, R. and Pasca, M. 2006. Using encyclopedic knowledge for named entity disambiguation, In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL). pages: 9 – 16

Bøhn C. and Nørv˚ag, K. 2010. Extracting named entities and synonyms from Wikipedia, In Proceedings of the 2010 24th IEEE International Conference on Advanced Information Networking and Applications (AINA), pages: 1300–1307

Church, K., Hanks, P. 1989. Word association norms, mutual information, and lexicography, In Proceedings of the 27th annual meeting on Association for Computational Linguistics (ACL), pages: 76 – 83

Cucerzan, S., 2007. Large-scale named entity disambiguation based on Wikipedia data, In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing (EMNLP), pages: 708 – 716

Diesner, J. and Carley, K. M. 2008. Conditional random fields for entity extraction and ontological text coding, Journal of Computational and Mathematical Organization Theory, Volume 14 Issue 3, pages 248 – 262

Finkel, J. R., Grenager, T. and Manning, C. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling, In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages: 363 – 370

Fleischman, M., Hovy, E. and Echihabi, A. 2003. Offline strategies for online question answering: Answering questions before they are asked. In Proceedings of 41st Annual Meeting of the Association for Computational Linguistics (ACL), pages: 1 – 7

Girju, R., Badulescu, A. and Moldovan, D. 2003. Learning semantic constraints for the automatic discovery of part-whole relations, In Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages: 1 – 8

Hearst, M. A., 1992. Automatic acquisition of hyponyms from large corpora, In Proceedings of 14th International Conference on Computational Linguistics (COLING), pages: 539 – 545

Hokama, T. and H. Kitagawa. 2006. Extracting mnemonic names of people from the Web. In Proceedings of 9th International Conference on Asian Digital Libraries (ICADL), pages: 121 – 130

Mann, G. S., 2002. Fine-grained proper noun ontologies for Question Answering. In Proceedings of SEMANET '02 workshop on Building and using semantic networks, pages: 1 – 7

Marneffe, M-C. de, MacCartney, B. and Manning, C. D. 2006. In Proceedings of International Conference on Language Resources and Evaluation (LREC), pages: 449 – 454

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. 1990. Introduction to WordNet: An online lexical database, International Journal of Lexicography, 3, (4), pages: 235 – 244

Pantel, P. and Lin, D. 2002. Discovering word senses from text. In Proceedings of 8th ACM SIGKDD international conference on Knowledge discovery and data mining, pages. 613 – 619.

Pantel, P. and Ravichandran, D. 2004. Automatically labeling semantic classes. In Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages: 321 – 328

Ravichandran, D., Hovy, E. 2002. Learning surface text patterns for a Question Answering system, In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, page: 41 – 47

Sarawagi, S. and Cohen, W. W. 2004. Semi-Markov conditional random fields for information extraction, In Proceedings of Conference on Advances in Neural Information Processing Systems, pages: 1185 – 1192

Sun, W., Xu, J. 2011. Enhancing Chinese Word Segmentation Using Unlabeled Data. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 970 – 979

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文