帳號:guest(216.73.216.146)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):黃思茹
作者(外文):Huang, Szu-Ju
論文名稱(中文):針對口頭演講自動推薦演講停頓點
論文名稱(外文):Automatic Determination of Speech Pause in Oral Presentation
指導教授(中文):張智星
張俊盛
指導教授(外文):Jang, Jyh-Shing
Chang, Jason S.
口試委員(中文):徐嘉連
呂仁園
口試委員(外文):Jia-Lien, Hsu
Ren-yuan, Lyu
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:101065515
出版年(民國):103
畢業學年度:102
語文別:英文中文
論文頁數:49
中文關鍵詞:停頓點推薦
外文關鍵詞:Pause suggestion
相關次數:
  • 推薦推薦:0
  • 點閱點閱:471
  • 評分評分:*****
  • 下載下載:3
  • 收藏收藏:0
我們時常將標點符號視為語句上可呼吸停頓的位址,然而,並不是所有停頓都發生在標點符號的位址,也不是所有標點符號都會停頓。本篇論文中,我們介紹一個可以針對英文語言學習者輸入的演講文稿自動推薦適當的停頓點的系統。在使用的方法中,我們必須將演講文稿裡面的標點符號去除,並且產生適當的特徵。其中包括自動產生標記停頓點的訓練資料、自動針對訓練資料產生文字上的特徵值,並且自動訓練分類器協助判斷停頓點。最終的評估顯示我們提出的方法在針對標記停頓點上有相當不錯表現。
Punctuation marks in text usually tend to be taken as breath pauses. However, not all pauses occur at punctuation marks, and, in fact, not all punctuations are designed to be pauses. In this paper, we introduce a method for suggesting speech pauses for a given script submitted by English language learners. In our approach, a text is transformed into a non-punctuated text with features aimed at suggesting appropriate pauses in speech. The method involves automatically generating training data annotated with pauses, automatically transform the training data into linguistic features, and automatically training a discriminative classifier. Evaluation shows that the proposed method achieves a satisfactory performance in suggesting pauses in given speech.
Abstract ii
Acknowledgments iv
Contents vi
List of Figures viii
List of Tables x
1 Introduction 1
2 Related Work 5
3 Method 9
3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Learning to Suggest Appropriate Pauses . . . . . . . . . . . . . . . . . . 11
3.2.1 Speech-Text Alignment (Forced Alignment) . . . . . . . . . . . . 11
3.2.2 Pause Candidate Selection . . . . . . . . . . . . . . . . . . . . . 13
3.2.3 Feature Generation . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.4 Training a Machine Learning Classifier . . . . . . . . . . . . . . 19
3.3 Run-Time Pauses Suggesting . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Experimental Setting 23
4.1 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.2 Consistency of Manually Annotated Pauses . . . . . . . . . . . . 26
4.3 Detail Setting of the Proposed Method . . . . . . . . . . . . . . . . . . . 27
4.3.1 Acoustic Models Training Setting . . . . . . . . . . . . . . . . . 27
4.3.2 Threshold Determination . . . . . . . . . . . . . . . . . . . . . . 28
4.3.3 Feature Generation . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Machine Learning Classifiers Compared . . . . . . . . . . . . . . . . . . 32
5 Evaluation 36
5.1 Classifiers Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Feature Set Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3 Overall Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6 Conclusion and Future Work 46
References 47
[1] Berger, A. L., Pietra, V. J. D., & Pietra, S. A. D. (1996, March). A maximum entropy approach to natural language processing. Comput. Linguist., 22(1), 39–71. Retrieved from http://dl.acm.org/citation.cfm id=234285.234289

[2] Bosker, H. R., Pinget, A.-F., Quene, H., Sanders, T., & de Jong, N. H. (2013, April). What makes speech sound fluent? the contributions of pauses, speed and repairs. Language Testing, 30(2), 159-175.

[3] Chiang, C.-Y., Wang, Y.-R., & Chen, S.-H. (2012, March). Punctuation generation inspired linguistic features for mandarin prosodic boundary prediction. In Acoustics, speech and signal processing (icassp), 2012 ieee international conference on (p. 4597-4600). doi: 10.1109/ICASSP.2012.6288942

[4] Derwing, T. M., Rossiter, M. J., Munro, M. J., & Thomson, R. I. (2004, December). Second language fluency: Judgments on different tasks. Language Learning, 54, 655-679.

[5] Hirschberg, J., & Prieto, P. (1996). Training intonational phrasing rules automatically for english and spanish text-to-speech. Speech Communication, 18.3, 281-290.

[6] Hosom, J.-P. (2002). Automatic phoneme alignment based on acoustic-phonetic modeling. In Interspeech.

[7] Huang, J., & Zweig, G. (2002). Maximum entropy model for punctuation annotation from speech.

[8] Koehn, P., Abney, S., Hirschberg, J., & Collins, M. (2000, ). Improving intonational phrasing with syntactic information. In Acoustics, speech, and signal processing, 2000. icassp ’00. proceedings. 2000 ieee international conference on (Vol. 3, p. 1289-1290 vol.3). doi: 10.1109/ICASSP.2000.861813

[9] Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Departmental Papers (CIS).

[10] Leeser, M. J. (2004, 12). The effects of topic familiarity, mode, and pausing on second language learners’ comprehension and focus on form. Studies in Second Language Acquisition, 26, 587–615. Retrieved from http://journals.cambridge.org/ article_S0272263104040033 doi: 10.1017/S0272263104040033

[11] Lu, W., & Ng, H. T. (2010). Better punctuation prediction with dynamic conditional random fields. In Proceedings of the 2010 conference on empirical methods in natural language processing (pp. 177–186). Stroudsburg, PA, USA: Association for Computational Linguistics. Retrieved from http://dl.acm.org/citation.cfm?id=1870658.1870676

[12] McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy markov models for information extraction and segmentation. In (p. 591-598).

[13] MLA Beeferman, D., Berger, A., & Lafferty, J. (1998). Cyberpunc: A lightweight punctuation annotation system for speech. In (Vol. 2).

[14] MLA Kim, J.-H., & Woodland, P. C. (2001). The use of prosody in a combined system for punctuation generation and speech recognition.

[15] Raupach, M. (1980). Temporal variables in firse and second language speech production. In Temporal variables in speech (p. 263-270).

[16] Riazantseva, A. (2001, 12). Second language proficiency and pausing a study of russian speakers of english. Studies in Second Language Acquisition, 23, 497–526. Retrieved from http://journals.cambridge.org/article_S027226310100403X

[17] Sajavaara, K. (1987). Second language speech production: Factors affecting fluency. In Psycholinguistic models of production (p. 45-65).

[18] Tavakoli, P. (2011). Pausing patterns: differences between l2 learners and native speakers. ELT Journal, 65 (1): 71-79.

[19] Tsuruoka, Y., Tateishi, Y., Kim, J.-D., Ohta, T., McNaught, J., Ananiadou, S., & Tsujii, J. (2005). Developing a robust part-of-speech tagger for biomedical text. In (p. 382-392).

[20] Viola, I. C., & Madureira, S. (2008). The roles of pause in speech expression.

[21] Wang, M. Q., & Hirschberg, J. (1991). Predicting intonational phrasing from text. In Proceedings of the 29th annual meeting on association for computational linguistics (pp. 285–292). Stroudsburg, PA, USA: Association for Computational Linguistics. Retrieved from http://dx.doi.org/10.3115/981344.981381 doi: 10.3115/981344.981381.

[22] Wennerstorm, A. (2000). The role of intonation in second language fluency. Perspectives on fluency, 102-127.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *