帳號:guest(18.219.109.150)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):楊坤儒
作者(外文):Yang, Kun-Ju
論文名稱(中文):運用網路語料庫之雙語詞彙對應
論文名稱(外文):Bilingual Word Alignment Using Web as Corpus
指導教授(中文):張俊盛
指導教授(外文):Chang, Jason S.
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:95625360
出版年(民國):98
畢業學年度:97
語文別:中文
論文頁數:41
中文關鍵詞:詞彙對應網路語料庫
相關次數:
  • 推薦推薦:0
  • 點閱點閱:558
  • 評分評分:*****
  • 下載下載:6
  • 收藏收藏:0
在本論文中,我們提出一個新方法,利用網路資源對雙語平行的句子做辭彙對應。我們的方法首先將平行句中的詞彙轉換成擴充查詢式,送至搜尋引擎查回多筆摘要,接著從摘要中統計出最有可能互為翻譯的單字和詞組。同時我們會計算出一些必要的特徵值,以便於能夠更精確的找出正確的辭彙對應。最後,我們從查詢回來的摘要中以特徵值對候選者進行過濾、計分與排序,挑選出最可能互為翻譯的單字和詞組。實驗結果顯示,我們的方法對於詞彙對應與克服資料稀疏的問題有顯著的效果。
We introduce a method for aligning words and phrases in a given pair of bilingual sentence using the Web as corpus. In our approach, each Chinese word and all English words are transformed into an query, sent to the search engine to retrieve mixed-code snippets. We use the returned snippets to align words and phrases such that the word and aligned word or phrase are likely to be translation counterparts. The method involves calculating features of alignment candidates, filtering candidates, scoring candidates, and ranking candidates. Finally, we select the most likely word or phrase alignment for each source word. The results show that the method can reach 53.2% recall and 88.6% precision.
摘要 i
ABSTRACT ii
致謝詞 iii
目次 iv
圖目次 v
第一章 簡介 1
第二章 相關文獻 4
第三章 方法 7
3.1 問題描述 7
3.2 方法的執行時期 8
3.2.1 產生擴充查詢式 8
3.2.2計算翻譯機率及熵值 10
3.2.3 CLA與中英文詞彙對應 13
第四章 實驗結果與討論 16
4.1 實驗設定 16
4.2 實驗結果 18
4.3 實驗討論 22
第五章 結論與未來展望 28
參考文獻 29
附錄A 31
Peter E. Brown, Cocke, John, Della Pietra, Stephen A., Della Pietra, Vincent J., Jelinek, Frederick, Lafferty, John D., Mercer, Robert L., and Roossin, Paul S. 1990. A statistical approach to machine translation. Computational Linguistics, 16(2), 79-85.
Peter. E. Brown, Della Pietra, S. A., Della Pietra, V. J., and Mercer, R.L. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19(2): 263-311.
Dorr and J. Bonnie. 1993. Machine Translation: A View from the Lexicon. MIT Press.
Pascale Fung and Kathleen McKeown. 1997. Finding terminology translations from non-parallel corpora. In The 5th Annual Workshop on Very Large Corpora, pages 192-202,Hong Kong, Aug.
P. Fung AND L. Y. Yee 1998. An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of The 36th Annual Conference of the Association for Computational Linguistics. 414-420.
W. John Hutchins. 1995. Machine Translation: A Brief History. In Koerner, E.F.K. and Asher, R.E. eds. Concise history of the language sciences (Oxford: Pergamon), 431-445.
A. Kilgarriff and G. Grefenstette. 2003. Introduction to the special issue on the web as corpus. Computational Linguistics, 29:333–347.
Lopez, A. 2007. A survey of statistical machine translation. Technical report, University of Maryland technical report, UMIACS-TR-2006-47.
Dekang Lin, Shaojun Zhao, Benjamin Van Durme and Marius Pas¸ca. 2008. Mining Parenthetical Translations from the Web by Word Alignment. In Proceedings of ACL-08:HLT, pages 994–1002, Columbus, Ohio, USA.
I. D. Melamed. (1997) A Word-to-Word Model of Translational Equivalence, Proceedings of the 35th Conference of the Association for Computational Linguistics. Madrid, Spain.
Dragos Stefan Munteanu, Alexander Fraser and Daniel Marcu, 2004. Improved Machine Translation Performace via Parallel Sentence Extraction from Comparable Corpora. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference.
M. Nagata, T. Saito, and K. Suzuki. 2001. Using the Web as a bilingual dictionary. In Proc. of ACL DD-MT Workshop.
Och, Franz Josef and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19–51.
Li Shao and Hwee Tou Ng. Mining new word translations from comparable corpora. In Proceedings of Coling 2004, Geneva, Switzerland, pp. 618–624.
J.C. Wu, T. Lin, J.S. Chang. Learning to Find English to Chinese Transliterations on the Web. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 996–1004.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *