帳號:guest(3.17.179.193)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳臆淳
作者(外文):Chen, Yi-Chun
論文名稱(中文):基於圖形自動抽取改述片語演算法
論文名稱(外文):A Graph-based Automatic Paraphrase Extraction Algorithm
指導教授(中文):張俊盛
指導教授(外文):Chang, Jason S.
口試委員(中文):陳信希
張嘉惠
口試委員(外文):Chen, Hsin-Hsi
Chang, Chia-Hui
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:100062567
出版年(民國):102
畢業學年度:101
語文別:英文
論文頁數:54
中文關鍵詞:產生改述片語圖形理論方法權重式 PageRank 演算法語言動機特徵
外文關鍵詞:Paraphrase generationGraph-based methodWeighted PageRank AlgorithmLinguistically motivated feature
相關次數:
  • 推薦推薦:0
  • 點閱點閱:131
  • 評分評分:*****
  • 下載下載:5
  • 收藏收藏:0
改述片語是使用不同的描述方式來表達相同的意思。自動產生改述片語可以應用於許多自然語言處理的課題中。我們提出一個產生改述片語的方法,使得產生的片語保留原始片語的語意及語法。此方法將產生改述片語的問題轉換為圖形理論方法,圖形中包含直接及間接的改述片語關係。方法中使用多個語言學的特徵來辨識候選改述片語之間的相似度,並使用權重式PageRank演算法評估候選改述片語與原始片語的相關性。本論文使用一組常用於學術文章的片語作為評估,人工評估結果顯示我們提出的方法優於現今最佳的其他方法。文法與語意的精確度均有顯著的提升。
Paraphrases are alternative ways to express the same meaning. Automatically generating paraphrases can be applied in many of National Language Processing tasks. We propose a method for generating paraphrases which preserve the meaning and the syntax of a given phrase. In our approach, the paraphrasing problem is transformed into a graph representing direct and indirect paraphrase relations. The method involves incorporating various linguistically motivated features to reflect the similarities of paraphrase candidates, and using Weighted PageRank Algorithm to evaluate the relevance of paraphrase candidates. Evaluation on a set of phrases commonly used in research articles shows that our method significantly outperforms the state-of-the-art methods under both semantic and syntactic considerations.
摘要 i
ABSTRACT ii
致謝辭 iii
TABLE OF CONTENTS iv
LIST OF FIGURES v
LIST OF TABLES vi
CHAPTER 1 INTRODUCTION 1
CHAPTER 2 RELATED WORK 5
CHAPTER 3 METHOD 9
3.1 Problem Statement 9
3.2 Graph Construction 10
3.3 Paraphrase Generation Framework 13
3.4 Linguistically Motivated Feature 15
CHAPTER 4 EXPERIMENTAL SETTING 20
4.1 Experimental Setting and Tuning 20
4.2 Paraphrase Generation Methods Compared 23
4.3 Evaluation Data Sets and their Judgments 26
4.4 Evaluation Metrics 27
CHAPTER 5 EVALUATION RESULTS 32
CHAPTER 6 CONCLUSION AND FUTURE WORK 42
REFERENCES 44
Appendix A – Development data set 49
Appendix B – Test phrases 50
Appendix C – Sample Output and Judgments 53
Bannard, C. and Callison-Burch C. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of ACL.

Barzilay, R. and McKeown, K. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of the 39th Annual Meeting of the ACL, pages 50–57.

Bhagat, R. and Ravichandran, D. 2008. Large scale acquisition of paraphrases for learning surface patterns. In Proceedings of ACL/HLT.

Callison-Burch, C. 2008. Syntactic constraints on paraphrases extracted from parallel corpora. In Proceedings of EMNLP, pages 196–205.

Callison-Burch, C., Koehn, P., and Osborne, M. 2006. Improved statistical machine translation using paraphrases. In Proceedings of HLT/NAACL, pages 17-24.

Carletta, J. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249–254.

Chan, T. P., Callison-Burch, C., and Durme, B. V. 2011. Reranking bilingually extracted paraphrases using monolingual distributional similarity. In Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, pages 33-42.

Chen, M. H., Huang, S. T., Huang, C. C., Liou, H. C. and Chang, J. S. 2012. PREFER: using a graph-based approach to generate paraphrases for language learning. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 80-85.

Cover, T. M., Thomas, J. A. 1991. Elements of information theory. John Wiley & Sons.


Ganitkevitch, J., Callison-Burch, C., Napoles, C., and Durme, B. V. 2011. Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation. In Proceedings of EMNLP.

Ganitkevitch, J., Durme, B. V., and Callison-Burch, C. 2012. Monolingual distributional similarity for text-to-text generation. In Proceedings of *SEM. Association for Computational Linguistics.

Järvelin, K., Kekäläinen, J. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems. 20(4), pages 422-446.

Koehn, P. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the 10th Machine Translation Summit.

Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of HLT/NAACL.

Kok, S. and Brockett, C. 2010. Hitting the right paraphrases in good time. In Proceedings of NAACL/HLT, pages 145-153.

Landis, J. R. and Koch, G. G. 1977. The measurement of observer agreement for categorical data. Biometrics, 33:159–174.

Lin, D. and Pantel, P. 2001. Discovery of inference rules for question answering. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 323–328.

Madnani, N. and Dorr, B. 2010. Generating phrasal and sentential paraphrases: A survey of data-driven methods. Computational Linguistics, 36(3):341–388.

Madnani, N., Ayan, N. F., Resnik, P., and Dorr, B. 2007. Using paraphrases for parameter tuning in statistical machine translation. In Proceedings of the ACL Workshop on Statistical Machine Translation.

Mckeown, K. R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J. L., Nenkova, A., Sable, C., Schiffman, B., and Sigelman, S. 2002. Tracking and summarizing news on a daily basis with Columbia’s newsblaster. In Proceedings of HLT, pages 280-285.

Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1): 19-51.

Page, L., Brin, S., Motwani, R., Winograd, T. 1999. The PageRank citation ranking: bringing order to the web. Technical Report. pages 1999-66, Stanford University InfoLab.

Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. 2002. Numerical recipes in C++. Cambridge University Press, Cambridge, UK.

Szpektor, I., Shnarch, E., and Dagan, I. 2007. Instance based evaluation of entailment rule acquisition. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 456-463.

Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V., and Liu, Y. 2007. Statistical machine translation for query expansion in answer retrieval. In Proceedings of ACL.

Tsuruoka, Y., Tateishi, Y., Kim, J. D., Ohta, T., McNaught, J., Ananiadou, S., and Tsujii, J. 2005. Developing a robust part-of-speech tagger for biomedical text. In Advances in Informatics - 10th Panhellenic Conference on Informatics, LNCS 3746, pages 382–392.

Voorhees, E. M. and Tice, D. M. 1999. The TREC-8 question answering track evaluation. In Proceedings of the Eighth Text RE-trieval Conference (TREC-8), pages 84–106.

Xing, W. and Ghorbani, A. 2004. Weighted pagerank algorithm. In Proceedings of the 2nd Annual Conference on Communication Networks and Services Research, pages 305–314.

Zhao, S., Wang, H., Liu, T., and Li, S. 2008. Pivot approach for extracting paraphrase patterns from bilingual corpora. In Proceedings of ACL/HLT.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *