帳號:guest(18.219.206.102)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):謝毅霖
作者(外文):Xie, Yi-Lin
論文名稱(中文):語言模型的文法改錯中之遺漏詞處理
論文名稱(外文):Dealing with Missing Words in Grammatical Error Correction based on Language Model
指導教授(中文):張俊盛
指導教授(外文):Chang, Jason S.
口試委員(中文):張智星
陳浩然
口試委員(外文):Jang, Jyh-Shing
Chen, Hao-Jan
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:106065523
出版年(民國):109
畢業學年度:108
語文別:英文
論文頁數:38
中文關鍵詞:文法改錯語言模型
外文關鍵詞:Grammatical Error CorrectionLanguage Model
相關次數:
  • 推薦推薦:0
  • 點閱點閱:662
  • 評分評分:*****
  • 下載下載:23
  • 收藏收藏:0
本論文提出一個英文文法改錯的方法,可以在不依賴標注資料的情況下來改正輸入句子中潛在的文法錯誤。我們採取應用語言模型(Language Model, LM)的研究路線——透過語言模型機率較高的句子較不容易含有文法錯誤的特性,來找到正確的改正結果。此方法涉及建立勘誤表(Confusion Set),遺漏詞復原,及訓練推論模型來生成、排序可能的改正結果,藉此將語言模型機率提升至最大,以改正文法錯誤。實驗結果顯示,我們的方法較前人研究,獲得較佳的結果,且對於遺漏詞處理的方面上,有著顯著的改進。
We present a grammatical error correction system that automatically corrects a given sentence with potential grammatical errors without using a parallel data. In our approach, a language model approach is used, based on the idea that low probability sentences are more likely to contain grammatical errors than high probability sentences. The method involves generating and ranking corrective candidates using confusion sets, missing word recovery, and training an inference model to maximize the LM probability and correct grammatical errors. Preliminary evaluation shows that our approach achieves better performance compared to previous work, especially for insertion errors.
Abstract i
摘要 ii
致謝 iii
Contents iv
List of Figures vi
List of Tables vii

1 Introduction 1

2 Related Work 4

3 Methodology 9
3.1 Problem Statement .......................... 9
3.2 Training Phase ............................. 10
3.2.1 Training a Language Model .................. 10
3.2.2 Training a Missing Word Insertion Model ........... 11
3.2.3 Training a Natural Language Inference Model ........ 12
3.3 Correcting Errors based on Language Model ............. 12
3.3.1 Generating Replacement and Deletion Candidates ...... 14
3.3.2 Generating Insertion Candidates ................ 15
3.3.3 Assessing Grammaticality with LM .............. 16
3.3.4 Filtering out Inappropriate Candidates ............ 16
3.3.5 Iterative Correction with Generated Candidates ....... 17

4 Experiment and Evaluation 20
4.1 Datasets and Tools ........................... 20
4.2 Threshold Tuning ............................ 21
4.3 Model Implementation ......................... 22
4.3.1 Language Model ........................ 22
4.3.2 Missing Word Insertion model ................. 22
4.3.3 Natural Language Inference Model .............. 22
4.4 Systems compared ........................... 23
4.5 Evaluation Metrics ........................... 23
4.5.1 MaxMatch ............................ 23
4.5.2 GLEU .............................. 24
4.6 Evaluation Results ........................... 25

5 Conclusion and Future Work 30

Reference 31
1. Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. A large annotated corpus for learning natural language inference. ArXiv, abs/1508.05326, 2015.

2. Christopher Bryant and Ted Briscoe. Language model based grammatical error correction without annotated training data. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 247–253, 2018.

3. Christopher Bryant, Mariano Felice, and Ted Briscoe. Automatic annotation and evaluation of error types for grammatical error correction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 793–805, Vancouver, Canada, July 2017. Association for Computational Linguistics. doi: 10.18653/v1/P17-1074. URL https://www.aclweb.org/anthology/P17-1074.

4. Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005, 2013.

5. Stanley F. Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling. In 34th Annual Meeting of the Association for Computational Linguistics, pages 310–318, Santa Cruz, California, USA, June 1996. Association for Computational Linguistics. doi: 10.3115/981863.981904. URL https://www.aclweb.org/anthology/P96-1041.

6. Martin Chodorow, Joel Tetreault, and Na-Rae Han. Detection of grammatical errors involving prepositions. In Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions, pages 25–30, Prague, Czech Republic, June 2007. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/W07-1604.

7. Shamil Chollampatt and Hwee Tou Ng. A multilayer convolutional encoder-decoder neural network for grammatical error correction. ArXiv, abs/1801.08831, 2018.

8. Daniel Dahlmeier and Hwee Tou Ng. Better evaluation for grammatical error correction. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 568–572, Montr´eal, Canada, June 2012. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/N12-1067.

9. Rachele De Felice and Stephen G. Pulman. A classifier-based approach to preposition and determiner error correction in L2 English. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 169–176, Manchester, UK, August 2008. Coling 2008 Organizing Committee. URL https://www.aclweb.org/anthology/C08-1022.

10. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.

11. Marina Dodigovic. Artificial intelligence and second language learning: An efficient approach to error remediation. 2007.

12. Nadir Durrani, Helmut Schmid, Alexander Fraser, Philipp Koehn, and Hinrich Sch¨utze. The operation sequence Model—Combining n-gram-based and phrasebased statistical machine translation. Computational Linguistics, 41(2):157186, June 2015. doi: 10.1162/COLI a 00218. URL https://www.aclweb.org/anthology/J15-2001.

13. Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. AllenNLP: A deep semantic natural language processing platform. In Proceedings of Workshop for NLP Open Source Software (NLP-OSS), pages 1–6, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/ W18-2501. URL https://www.aclweb.org/anthology/W18-2501.

14. Tao Ge, Furu Wei, and Ming Zhou. Reaching human-level performance in automatic grammatical error correction: An empirical study. CoRR, abs/1807.01270, 2018.

15. Aaron Gokaslan and Vanya Cohen. Openwebtext corpus.

16. Roman Grundkiewicz and Marcin Junczys-Dowmunt. The wiked error corpus: A corpus of corrective wikipedia edits and its application to grammatical error correction. In International Conference on Natural Language Processing, pages 478–490. Springer, 2014.

17. Na-Rae Han, Martin Chodorow, and Claudia Leacock. Detecting errors in english article usage by non-native speakers. Nat. Lang. Eng., 12:115–129, 2006.

18. Kenneth Heafield. Kenlm: Faster and smaller language model queries. In Proceedings of the sixth workshop on statistical machine translation, pages 187–197. Association for Computational Linguistics, 2011.

19. Matthew Honnibal and Mark Johnson. An improved non-monotonic transition system for dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1373–1378, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10. 18653/v1/D15-1162. URL https://www.aclweb.org/anthology/D15-1162.

20. Marcin Junczys-Dowmunt and Roman Grundkiewicz. Phrase-based machine translation is state-of-the-art for automatic grammatical error correction. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1546–1556, Austin, Texas, November 2016. Association for Computational Linguistics. doi: 10.18653/v1/D16-1161. URL https://www.aclweb.org/anthology/D16-1161.

21. Ting-Hui Kao, Yu-Wei Chang, Hsun-Wen Chiu, Tzu-Hsi Yen, Joanne Boisson, Jian-Cheng Wu, and Jason S. Chang. CoNLL-2013 shared task: Grammatical error correction NTHU system description. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pages 20–25, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/W13-3603.

22. Shun Kiyono, Jun Suzuki, Masato Mita, Tomoya Mizumoto, and Kentaro Inui. An empirical study of incorporating pseudo data into grammatical error correction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1236–1242, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/ D19-1119. URL https://www.aclweb.org/anthology/D19-1119.

23. Claudia Leacock, Martin Chodorow, Michael Gamon, and Joel Tetreault. Automated grammatical error detection for language learners. Synthesis lectures on human language technologies, 3(1):1–134, 2010.

24. John Lee and Stephanie Seneff. Correcting misuse of verb forms. In Proceedings of ACL-08: HLT, pages 174–182, Columbus, Ohio, June 2008. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/ P08-1021.

25. Jared Lichtarge, Chris Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, and Simon Tong. Corpora generation for grammatical error correction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3291–3301, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1333. URL https://www.aclweb.org/anthology/N19-1333.

26. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692, 2019.

27. A. Mani. Solving text imputation using recurrent neural networks. 2015.

28. Kathleen F. McCoy, Christopher A. Pennington, and Linda Z. Suri. English error correction : A syntactic user model based on principled “ mal-rule ” scoring. 1996.

29. Lisa N. Michaud, Kathleen F. McCoy, and Christopher A. Pennington. An intelligent tutoring system for deaf learners of written english. In Assets ’00, 2000.

30. Tomoya Mizumoto, Mamoru Komachi, Masaaki Nagata, and Yuji Matsumoto. Mining revision log of language learning SNS for automated Japanese error correction of second language learners. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 147–155, Chiang Mai, Thailand, November 2011. Asian Federation of Natural Language Processing. URL https://www.aclweb.org/anthology/I11-1017.

31. Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. Ground truth for grammatical error correction metrics. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 588–593, Beijing, China, July 2015. Association for Computational Linguistics. doi: 10.3115/v1/P15-2097. URL https://www.aclweb.org/anthology/P15-2097.

32. Courtney Napoles, Keisuke Sakaguchi, and Joel Tetreault. Jfleg: A fluency corpus and benchmark for grammatical error correction. arXiv preprint arXiv:1702.04066, 2017.

33. Hwee Tou Ng, Siew Mei Wu, Yuanbin Wu, Christian Hadiwinoto, and Joel Tetreault. The CoNLL-2013 shared task on grammatical error correction. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pages 1–12, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/W13-3601.

34. Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. The CoNLL-2014 shared task on grammatical error correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pages 1–14, Baltimore, Maryland, June 2014. Association for Computational Linguistics. doi: 10.3115/v1/W14-1701. URL https://www.aclweb.org/anthology/W14-1701.

35. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073135. URL https://www.aclweb.org/anthology/P02-1040.

36. Jong C. Park, Martha Palmer, and Clay Washburn. An English grammar checker as a writing aid for students of English as a second language. In Fifth Conference on Applied Natural Language Processing: Descriptions of System Demonstrations and Videos, pages 24–24, Washington, DC, USA, March 1997. Association for Computational Linguistics. doi: 10.3115/974281.974296. URL https://www.aclweb.org/anthology/A97-2014.

37. Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.

38. David Schneider and Kathleen F. McCoy. Recognizing syntactic errors in the writing of second language learners. In 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2, pages 1198–1204, Montreal, Quebec, Canada, August 1998. Association for Computational Linguistics. doi: 10.3115/980691.980765. URL https://www.aclweb.org/anthology/P98-2196.

39. Joel Tetreault, Jennifer Foster, and Martin Chodorow. Using parse features for preposition selection and error detection. In Proceedings of the ACL 2010 Conference Short Papers, pages 353–358, Uppsala, Sweden, July 2010. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/P10-2065.

40. Trieu H. Trinh and Quoc V. Le. A simple method for commonsense reasoning. ArXiv, abs/1806.02847, 2018.

41. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R’emi Louf, Morgan Funtowicz, and Jamie Brew. Huggingface’s transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771, 2019.

42. Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, and Andrew Y.

43. Ng. Neural language correction with character-based attention. ArXiv, abs/1603.09727, 2016.

44. Ziang Xie, Guillaume Genthial, Stanley Xie, Andrew Ng, and Dan Jurafsky. Noising and denoising natural language: Diverse backtranslation for grammar correction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 619–628, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1057. URL https://www.aclweb.org/anthology/N18-1057.

45. Helen Yannakoudakis, Marek Rei, Øistein E. Andersen, and Zheng Yuan. Neural sequence-labelling models for grammatical error correction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2795–2806, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-1297. URL https://www.aclweb.org/anthology/D17-1297.

46. Dayu Yuan, Julian Richardson, Ryan Doherty, Colin Evans, and Eric Altendorf. Semi-supervised word sense disambiguation with neural models. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 1374–1385, Osaka, Japan, December 2016. The COLING 2016 Organizing Committee. URL https://www.aclweb.org/anthology/C16-1130.

47. Zheng Yuan and Ted Briscoe. Grammatical error correction using neural machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 380–386, San Diego, California, June 2016. Association for Computational Linguistics. doi: 10.18653/v1/N16-1042. URL https://www.aclweb.org/anthology/N16-1042.

48. Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, and Jingming Liu. Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 156–165, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10. 18653/v1/N19-1014. URL https://www.aclweb.org/anthology/N19-1014.

49. Yukun Zhu, Ryan Kiros, Richard S. Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. 2015 IEEE International Conference on Computer Vision (ICCV), pages 19–27, 2015.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *