帳號:guest(18.225.255.187)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):李巧雯
作者(外文):Li, Chiao-Wen
論文名稱(中文):類神經機器翻譯為本的中文拼字改錯系統
論文名稱(外文):Chinese Spelling Check based on Neural Machine Translation
指導教授(中文):張俊盛
指導教授(外文):Chang, Jyun-Sheng
口試委員(中文):許永真
柯淑津
口試委員(外文):Hsu, Yung-Jen
Ker, Sue-Jin
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:105065504
出版年(民國):107
畢業學年度:106
語文別:英文
論文頁數:44
中文關鍵詞:中文拼字改錯生成人造錯誤類神經機器翻譯改稿紀錄編輯紀錄
外文關鍵詞:Chinese Spelling CheckChinese Error CorrectionArtificial Error GenerationNeural Machine TranslationEdit Log
相關次數:
  • 推薦推薦:0
  • 點閱點閱:408
  • 評分評分:*****
  • 下載下載:6
  • 收藏收藏:0
本論文提出一個中文拼字改錯的方法,自動學習改正一個句子中潛在的拼字錯誤。 我們應用類神經機器翻譯模型(Neural Machine Translation, NMT)於中文拼字改錯,亦即將一句可能有拼字錯誤的句子翻譯為正確的句子。 我們使用從新聞改稿紀錄和人造錯誤資料中提取的對與錯的句對來訓練一個NMT拼字改錯模型。 在訓練階段,我們首先從新聞改稿紀錄抽取與拼字錯誤修改有關的句子。為了擴充訓練資料,我們使用勘誤表(Confusion Set)來生成具有拼字錯誤的句子,接著用這些資料來訓練模型。 實驗結果顯示,改稿紀錄加上人造錯誤資料所訓練的模型有較好的效能。
We present a method for Chinese spelling check that automatically learns to correct a sentence with potential spelling errors. In our approach, a character-based neural machine translation (NMT) model is trained to translate the potentially misspelled sentence into correct one, using right-and-wrong sentence pairs from newspaper edit logs and artificially generated data. The method involves extracting sentences contain edit of spelling correction from edit logs, using commonly confused right-and-wrong word pairs to generate artificial right-and-wrong sentence pairs in order to expand our training data , and training the NMT model. The evaluation on the United Daily News (UDN) Edit Logs and SIGHAN-7 Shared Task shows that adding artificial error data can significantly improve the performance of Chinese spelling check system.
Abstract .......... ii
Acknowledgements .......... iii
Contents .......... iv
List of Figures .......... vi
List of Tables .......... vii
1 Introduction .......... 1
2 Related Work .......... 5
3 Methodology .......... 9
4 Experimental Setting .......... 22
5 Results and Discussion .......... 33
6 Conclusion and Future Work .......... 39
Reference .......... 41
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.

Chao-Huang Chang. A new approach for automatic chinese spelling correction. In Proceedings of Natural Language Processing Pacific Rim Symposium, volume 95, pages 278–283. Citeseer, 1995.

Hsun-wen Chiu, Jian-cheng Wu, and Jason S Chang. Chinese spelling checker based on statistical machine translation. In Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, pages 49–53, 2013.

Shamil Chollampatt and Hwee Tou Ng. A multilayer convolutional encoder- decoder neural network for grammatical error correction. arXiv preprint arXiv:1801.08831, 2018.

Mariano Felice and Zheng Yuan. Generating artificial errors for grammatical error correction. In Proceedings of the Student Research Workshop at the 14th Confer- ence of the European Chapter of the Association for Computational Linguistics, pages 116–126, 2014.

Sunyan Gu and Fei Lang. A chinese text corrector based on seq2seq model. In Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2017 International Conference on, pages 322–325. IEEE, 2017.

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander M Rush. Opennmt: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810, 2017.

C-L Liu, M-H Lai, K-W Tien, Y-H Chuang, S-H Wu, and C-Y Lee. Visually and phonologically similar characters in incorrect chinese words: Analyses, identification, and applications. ACM Transactions on Asian Language Information Processing (TALIP), 10(2):10, 2011.

Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025, 2015.

Wei-Yun Ma and Keh-Jiann Chen. Introduction to ckip chinese word segmenta- tion system for the first international chinese word segmentation bakeoff. In Proceedings of the 2nd SIGHAN on CLP, pages 168–171, 2003.

Marek Rei, Mariano Felice, Zheng Yuan, and Ted Briscoe. Artificial error generation with machine translation and syntactic patterns. arXiv preprint arXiv:1707.05236, 2017.

Yuen-Hsien Tseng, Lung-Hao Lee, Li-Ping Chang, and Hsin-Hsi Chen. Introduction to sighan 2015 bake-off for chinese spelling check. In Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pages 32–37, 2015.

Shih-Hung Wu, Yong-Zhi Chen, Ping-Che Yang, Tsun Ku, and Chao-Lin Liu. Reducing the false alarm rate of chinese character error detection and correction. In CIPS-SIGHAN Joint Conference on Chinese Language Processing, 2010.

Shih-Hung Wu, Chao-Lin Liu, and Lung-Hao Lee. Chinese spelling check evalua- tion at sighan bake-off 2013. In Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, pages 35–42, 2013.

Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, and Andrew Y Ng. Neural language correction with character-based attention. arXiv preprint arXiv:1603.09727, 2016.

Zheng Yuan and Ted Briscoe. Grammatical error correction using neural ma- chine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 380–386, 2016.

Lei Zhang, Changning Huang, Ming Zhou, and Haihua Pan. Automatic detect- ing/correcting errors in chinese text by an approximate word-matching algo- rithm. In Proceedings of the 38th Annual Meeting on Association for Compu- tational Linguistics, pages 248–254. Association for Computational Linguistics, 2000.

蔡有秩. 新編錯別字門診. 語文訓練叢書. 螢火蟲, 2003. ISBN 9789867999115. URL https://books.google.com.tw/books?id=2t1LAAAACAAJ.

蔡榮圳. 常見錯別字辨正辭典. 中文可以更好. 商周出版, 2012. ISBN 9789866285585. URL https://books.google.com.tw/books?id= WV2YMwEACAAJ.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *