帳號:guest(18.220.116.195)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳志杰
作者(外文):Chen, Jhih-Jie
論文名稱(中文):統計式與類神經機器翻譯應用於英文文法改錯
論文名稱(外文):Grammatical Error Correction Using Statistical and Neural Machine Translation
指導教授(中文):張俊盛
指導教授(外文):Chang, Jason S.
口試委員(中文):陳浩然
馬偉雲
口試委員(外文):Chen, Hao-Jan
Ma, Wei-Yun
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:103062702
出版年(民國):108
畢業學年度:107
語文別:英文
論文頁數:33
中文關鍵詞:自動文法改錯機器翻譯混合系統
外文關鍵詞:Automatic Grammatical Error CorrectionMachine TranslationHybrid System
相關次數:
  • 推薦推薦:0
  • 點閱點閱:574
  • 評分評分:*****
  • 下載下載:17
  • 收藏收藏:0
本研究探討利用混合式模型來解決文法改錯問題。我們實作了統計式翻譯和類神經翻譯改錯模型,並開發一系列的混合方法來整合兩者。開發統計式翻譯的過程涉及預處理已標註的學習者語料、訓練語言模型、建立錯誤翻譯模型,並利用解碼器產生正確句子。接著我們處理原始的標記資料,轉換為平行的改正前、改正後的句子配對,並以此訓練類神經改錯模型。我們利用重新計分、投票、Pipeline 等方法來整合統計式、類神經模型。公開資料集的實驗顯示,我們的混合模型有效的利用統計式和類神經模型的優勢,並達到最佳的效果。最後,我們探討實驗的結果,也指出文法改錯研究所面臨的挑戰。
The paper investigates hybrid approaches to solving grammatical error correction (GEC) problems. In our approach, we develop statistical machine translation (SMT) and neural machine translation (NMT) models, and build a series of hybrid systems incorporating them. The SMT method involves preprocessing annotated learner corpora, constructing a translation model, training a language model, and finally generating correction with a decoder. Annotated sentences are then converted into parallel sentence pairs to train NMT models. We use re-scoring, voting, and pipeline techniques to integrate SMT and NMT models. Experiments on public testsets indicate that our hybrid systems effectively exploit the strength of both SMT and NMT models and achieve the best performance. Finally, we discuss the result and address the challenges facing in the GEC field.
Abstract i
摘要 ii
致謝 iii
Contents iv
List of Figures vii
List of Tables viii

1 Introduction 1

2 Related Work 5
2.1 Grammatical Error Correction..................... 5
2.1.1 Traditional Approaches..................... 5
2.1.2 SMT Approaches ........................ 6
2.1.3 NMT Approaches........................ 6
2.2 Learner Corpora ............................ 7
2.2.1 The NUCLE Dataset...................... 7
2.2.2 The CLC FCE Dataset..................... 7
2.2.3 The JFLEG Corpus....................... 8
2.2.4 The EF-Cambridge Open Language Database . . . . . . . . 8
2.3 Evaluation Metrics and Tool...................... 9
2.3.1 MaxMatch............................ 9
2.3.2 GLEU.............................. 10
2.3.3 ERRANT ............................ 10

3 Methodology 11
3.1 Traditional Methods .......................... 11
3.1.1 Spell Error Correction ..................... 12
3.1.2 Rule-based Article Error Correction. . . . . . . . . . . . . . 12
3.2 Statistical Machine Translation .................... 12
3.2.1 Language Model ........................ 13
3.2.2 Translation Model ....................... 13
3.2.3 Decoder ............................. 14
3.3 Neural Machine Translation ...................... 15
3.3.1 Encoder-decoder ........................ 15
3.4 Hybrid GEC Models .......................... 18
3.4.1 Re-scoring with LM....................... 18
3.4.2 Pipeline ............................. 18
3.4.3 Voting.............................. 19

4 Experiments and Evaluation 20
4.1 Dataset ................................. 20
4.1.1 Development data........................ 20
4.1.2 Training data .......................... 20
4.1.3 Testing data........................... 21
4.2 Preprocess................................ 21
4.2.1 The diff+ format ........................ 22
4.2.2 Error tagging schemes ..................... 22
4.2.3 Automatic error annotation .................. 22
4.3 Hyperparameters ............................ 23
4.3.1 Hyperparameters for LM.................... 23
4.3.2 Hyperparameters for SMT models. . . . . . . . . . . . . . . 23
4.3.3 Hyperparameters for NMT models .............. 23
4.4 Models for comparison ......................... 24
4.5 Evaluation................................ 24

5 Discussion 27
5.1 The difference between GEC and MT................. 27
5.2 Larger data or cleaner data ...................... 27

6 Conclusion 29

Reference 30
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
Chris Brockett, William B Dolan, and Michael Gamon. Correcting esl errors using phrasal smt techniques. In Proceedings of the 2006 ACL, pages 249–256. Association for Computational Linguistics, 2006.
Christopher Bryant, Mariano Felice, and Ted Briscoe. Automatic annotation and eval- uation of error types for grammatical error correction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 793–805, 2017.
Stanley F Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling. Computer Speech & Language, 13(4):359–394, 1999.
Martin Chodorow, Michael Gamon, and Joel Tetreault. The utility of article and prepo- sition error correction systems for english language learners: Feedback and assessment. Language Testing, 2010.
David Crystal. English as a global language. Cambridge university press, 2012.
Daniel Dahlmeier and Hwee Tou Ng. A beam-search decoder for grammatical error correction. In Proceedings of the 2012 EMNLP, pages 568–578. Association for Com- putational Linguistics, 2012a.
Daniel Dahlmeier and Hwee Tou Ng. Better evaluation for grammatical error correction. In Proceedings of the 2012 Conference of the North American Chapter of the Associ- ation for Computational Linguistics: Human Language Technologies, pages 568–572. Association for Computational Linguistics, 2012b.
Daniel Dahlmeier, Hwee Tou Ng, and Siew Mei Wu. Building a large annotated corpus of learner english: The nus corpus of learner english. In BEA@NAACL-HLT, 2013.
Mariano Felice. Artificial error generation for translation-based grammatical error cor- rection. Technical report, University of Cambridge, Computer Laboratory, 2016.
Tao Ge, Furu Wei, and Ming Zhou. Reaching human-level performance in automatic grammatical error correction: An empirical study. arXiv preprint arXiv:1807.01270, 2018.
Jeroen Geertzen, Theodora Alexopoulou, and Anna Korhonen. Automatic linguistic annotation of large scale l2 databases: The ef-cambridge open language database. In Proceedings of SLRF 2012, 2013.
Roman Grundkiewicz and Marcin Junczys-Dowmunt. Near human-level performance in grammatical error correction with hybrid machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 284–290, 2018.
Kenneth Heafield. KenLM: faster and smaller language model queries. In Proceedings of the EMNLP 2011 Sixth Workshop on Statistical Machine Translation, pages 187–197, 2011.
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
Claudia Leacock and Martin Chodorow. Automated grammatical error detection. Au- tomated essay scoring: A cross-disciplinary perspective, 2003.
Minh-Thang Luong, Ilya Sutskever, Quoc V Le, Oriol Vinyals, and Wojciech Zaremba. Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206, 2014.
Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421, 2015.
Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. Ground truth for grammatical error correction metrics. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), volume 2, pages 588–593, 2015.
Alla Rozovskaya and Dan Roth. Building a state-of-the-art grammatical error correction system. Transactions of the Association of Computational Linguistics, 2(1):419–434, 2014.
Alla Rozovskaya and Dan Roth. Grammatical error correction: Machine translation and classifiers. In Proceedings of the 2016 ACL, volume 1, pages 2205–2215, 2016.
Gary F Simons and Charles D Fennig. Ethnologue: Languages of the world, 21st ed. Dallas, Texas: SIL International., 2018.
Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, and Andrew Y Ng. Neural language correction with character-based attention. arXiv preprint arXiv:1603.09727, 2016.
Helen Yannakoudakis, Ted Briscoe, and Ben Medlock. A new dataset and method for automatically grading esol texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 180–189. Association for Computational Linguistics, 2011.
Zheng Yuan. Grammatical error correction in non-native english. Technical report, University of Cambridge, Computer Laboratory, 2017.
Zheng Yuan and Ted Briscoe. Grammatical error correction using neural machine trans- lation. In Proceedings of the 2016 NAACL-HLT, pages 380–386, 2016.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *