作者(外文):Liu, Yu-Lan
論文名稱(外文):Finding Definitions of Neologisms on the Web
指導教授(外文):Chang, Jason S.
口試委員(外文):Sue J. Ker
Lin, Ching-Lung
外文關鍵詞:Definition ExtractionInformation RetrievalText Clustering
Nowadays, newly coined terms or new usage of existing terms are flourishing due to the prevailing trends of social network sites and Internet forums. Therefore, there is a pressing need for updated and reliable definitions. However, the traditional manually edited dictionaries had fallen behind in providing neologisms' definitions in time. In this paper, we present a method for automatically finding Chinese neologism definitions on the Web. In our approach, we use lexical patterns to bias the search engine towards retrieving snippets containing the definition of the given term. We use Wikipedia as training data to build a definition classifier without human annotated training data. Furthermore, we cluster the definition candidates by their meanings, in order to distinguish existing and new meanings. In our experiments, we applied the proposed system to find definitions for about 150 Chinese neologisms on the Web. The experimental results show that the proposed methods are reasonably accurate providing an efficient way to mine definitions on the Web.
Abstract i
Acknowledgments iii
Contents vi
List of Figures viii
List of Tables iX
1 Introduction 1
2 Related Work 4
3 Method 7
3.1 Problem Statement 8
3.2 Maximum Entropy Modeling 9
3.3 The Definition Patterns 10
3.4 Training Phase 11
3.4.1 Retrieve Positive Data from Wikipedia 12
3.4.2 Retrieve Negative Data from the Web 12
3.4.3 Preprocessing the Training Data 13
3.4.4 Generate Features for Maximum Entropy Classifier 14
3.5 Run-time Phase 15
3.5.1 Retrieve Candidate Sentences via Search Engine 15
3.5.2 Filtering Non-definition 17
3.5.3 Definition Clustering 17
4 Experimental Setting and Results 20
4.1 Experimental Setting 20
4.2 Evaluation of the Definition Classifier 22
4.3 Evaluation of the Clustering Result 24
4.4 Evaluation of Definition Mining System 25
5 Conclusion and Future Work 28
References 30
Appendices 33
A. Sample of System Output 34
B. Definition Results 40
