帳號:guest(3.136.18.218)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):劉郁蘭
作者(外文):Liu, Yu-Lan
論文名稱(中文):利用網路搜尋流行語定義
論文名稱(外文):Finding Definitions of Neologisms on the Web
指導教授(中文):張俊盛
指導教授(外文):Chang, Jason S.
口試委員(中文):柯淑津
林慶隆
口試委員(外文):Sue J. Ker
Lin, Ching-Lung
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:100062528
出版年(民國):103
畢業學年度:103
語文別:英文
論文頁數:43
中文關鍵詞:定義抽取資訊檢索文本分群
外文關鍵詞:Definition ExtractionInformation RetrievalText Clustering
相關次數:
  • 推薦推薦:0
  • 點閱點閱:564
  • 評分評分:*****
  • 下載下載:40
  • 收藏收藏:0
現今資訊發達,社群網站及網路論壇的盛行造就大量網路用語或是流行語出現,並介由大眾媒體迅速進入生活中,也形成人們對可靠並且能夠提供最新流行語定義服務的需求。然而,傳統人力編輯的字典已難以及時提供最新的流行語定義,因此本論文提出一自動化於網際網路查找中文流行語定義的方法。此方法首先利用一系列定義句樣式規則擴展查詢之詞彙,用以透過搜尋引擎取得該詞彙的可能定義句。我們利用維基百科作出不需人工標記的訓練資料,以訓練定義句分類器。此外,我們針對定義句所指稱的意義做分群,若該詞彙具有多種意義,則能夠將找到的定義句分成不同意義的數個群。我們取得將近150個中文流行語進行實驗,使用本論文提出的系統於網路查找定義,實驗結果顯示本論文提出的方法有效幫助取得網路上的流行語定義。
Nowadays, newly coined terms or new usage of existing terms are flourishing due to the prevailing trends of social network sites and Internet forums. Therefore, there is a pressing need for updated and reliable definitions. However, the traditional manually edited dictionaries had fallen behind in providing neologisms' definitions in time. In this paper, we present a method for automatically finding Chinese neologism definitions on the Web. In our approach, we use lexical patterns to bias the search engine towards retrieving snippets containing the definition of the given term. We use Wikipedia as training data to build a definition classifier without human annotated training data. Furthermore, we cluster the definition candidates by their meanings, in order to distinguish existing and new meanings. In our experiments, we applied the proposed system to find definitions for about 150 Chinese neologisms on the Web. The experimental results show that the proposed methods are reasonably accurate providing an efficient way to mine definitions on the Web.
Abstract i
Acknowledgments iii
Contents vi
List of Figures viii
List of Tables iX
1 Introduction 1
2 Related Work 4
3 Method 7
3.1 Problem Statement 8
3.2 Maximum Entropy Modeling 9
3.3 The Definition Patterns 10
3.4 Training Phase 11
3.4.1 Retrieve Positive Data from Wikipedia 12
3.4.2 Retrieve Negative Data from the Web 12
3.4.3 Preprocessing the Training Data 13
3.4.4 Generate Features for Maximum Entropy Classifier 14
3.5 Run-time Phase 15
3.5.1 Retrieve Candidate Sentences via Search Engine 15
3.5.2 Filtering Non-definition 17
3.5.3 Definition Clustering 17
4 Experimental Setting and Results 20
4.1 Experimental Setting 20
4.2 Evaluation of the Definition Classifier 22
4.3 Evaluation of the Clustering Result 24
4.4 Evaluation of Definition Mining System 25
5 Conclusion and Future Work 28
References 30
Appendices 33
A. Sample of System Output 34
B. Definition Results 40
Alani, H., Kim, S., Millard, D. E., Weal, M. J., Hall, W., Lewis, P. H., and Shadbolt, N. R. 2003. Automatic Ontology-Based Knowledge Extraction from Web Documents. IEEE Intelligent Systems 18, 1 (January 2003), pages: 14-21.

Berger, A. L., Della Pietra, V. J., and Della Pietra, S. A. 1996. A maximum entropy approach to natural language processing. Computing Linguistics 22, 1 (March 1996), pages: 39-71.

Borg, C., Rosner, M., and Pace, G. 2009. Evolutionary algorithms for definition extraction. In Proceedings of the 1st Workshop on Definition Extraction (WDE '09), pages: 26-32.

Carletta, J. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2), pages: 249–254.

Che, W., Li, Z., Liu, T. 2010. LTP: A Chinese Language Technology Platform. In Proceedings of the Coling 2010:Demonstrations, pages: 13-16. Beijing, China.

Hsieh, Y. M., Bai, M. H., Chang, J. S., and Chen, K. J., 2012. Improving PCFG Chinese Parsing with Context-Dependent Probability Re-estimation, CLP-2012.

Hu, J., Fang, L., Cao, Y., Zeng, HJ., Li, H., Yang, Q., and Chen, Z. 2008. Enhancing text clustering by leveraging Wikipedia semantics. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '08), pages: 179-186.

Klavans J. and Muresan S. 2001. Evaluation of the DEFINDER System for Fully Automatic Glossary Construction. Proceedings of AMIA Symposium 2001, pges: 324–328.

Li, Y., Chung, S. M., and Holt, J. D. 2008. Text document clustering based on frequent word meaning sequences. Data Knowledge Engineering 64, 1 (January 2008), pages: 381-404.

Liu, B., Chin, C. W., and Ng, H. T. 2003. Mining topic-specific concepts and definitions on the web. In Proceedings of the 12th international conference on World Wide Web (WWW '03), pages: 251-260.

Liu, T., Liu S., Chen, Z. 2003. An Evaluation on Feature Selection for Text Clustering. In ICML 2003, pages: 488-495.

Manning, C. D., Raghavan, P., and Schütze, H. 2008. Introduction to Information Retrieval, Cambridge University Press.

Navigli, R. 2009. Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2, Article 10 (February 2009), 69 pages.

Navigli, R. and Velardi, P. 2010. Learning word-class lattices for definition and hypernym extraction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL '10), pages: 1318-1327.

Przepiorkowski, A., Degorski, L., Wojtowicz, B., Spousta, M., Kubon, V., Simov, K., Osenova, P., and Lemnitzer, L. 2007. Towards the automatic extraction of definitions in Slavic. In Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies (ACL '07), pages: 43-50.

Song, W., Li, C. H., and Park, S. C. 2009. Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert System Applications 36, 5 (July 2009), pages: 9095-9104.

Surdeanu, M., Ciaramita, M., and Zaragoza,H. 2011. Learning to rank answers to non-factoid questions from web collections. Computational Linguistics 37, 2 (June 2011), pages: 351-383.

Westerhout, E. 2009. Definition extraction using linguistic and structural features. In Proceedings of the 1st Workshop on Definition Extraction (WDE '09), pages: 61-67.

Zhang, C. and Jiang, P. 2009. Automatic Extraction of Definitions. In Proceedings of 2009 2nd IEEE International Conference on Computer Science and Information Technology (ICCSIT), pages: 364-368.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *