透過表演逐字稿之互動特徵以改善中文戲劇表演資料庫情緒辨識系統_

帳號：guest(216.73.216.157) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	張雨
作者(外文):	Zhang, Yu
論文名稱(中文):	透過表演逐字稿之互動特徵以改善中文戲劇表演資料庫情緒辨識系統
論文名稱(外文):	Improving Chinese Performance Corpus Emotion Recognition System by Using Transcripts Interactive Features
指導教授(中文):	李祈均
指導教授(外文):	Lee, Chi-Chun
口試委員(中文):	古倫維曹昱賴穎暉
口試委員(外文):	Ku, Lun-Wei Tsao, Yu Lai, Ying-Hui
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	電機工程學系
學號:	104061468
出版年(民國):	107
畢業學年度:	106
語文別:	中文
論文頁數:	43
中文關鍵詞:	行為訊號處理、情緒辨識、自然語言處理、支持向量機、長短時記憶網路
外文關鍵詞:	behavior signal processing、emotion recognition、NLP、SVM、long short-term memory
相關次數:	推薦:0 點閱:1092 評分: 下載:0 收藏:0

近幾年，研究學者經常使用人類外部行為訊號提取特徵來進行機器學習建模以達到激動程度(Activation/Arousal)和情緒正負向(Valence)情緒辨識的目的。這些外部行為訊號包括文字、聲音、影像等，都能有助於很好地情緒預測。文字特徵為其中一種重要訊號，如果使用個人單一的對話文字特徵，會造成一部分情緒訊息的丟失，其辨識結果會有一定的局限性。因此本文旨在加入對話文字之時間序列性特徵，融合兩人之間的互動訊息以強化情緒辨識之效果。
本文研究內容第一部分使用詞向量(Word2vec)對分詞之後的原始逐字稿進行詞嵌入(Word Embedding)，並搭配長短時記憶網路(LSTM)架構提取含有時間資訊的文字模態之高階特徵，然後使用費舍編碼(Fisher Vector Encoding)整合特征向量，最後使用支持向量機進行學習建模進行三元分類，Arousal和Valence分別得到0.413和0.489的未加權平均查全率(UAR)，相較於使用傳統的TF-IDF特徵與向量化方法特徵有所提升，說明增加的時間序列特徵對於對話文字特徵有一定的增益性。第二部分我們通過深度學習自動編碼器模型加入了兩位演員之間的文字互動訊息，由此得到新的自動化模型Arousal和Valence辨識結果分別達到了0.458和0.556。此建模方法相比第一部分能更好地反應兩人對話之間個人的情緒狀態，藉由此說明融合互動資訊可以幫助本系統取得更好的自動化情緒分類效果。

Rencent years, researcher often use human behavior signal to extract machine learning features, in order to recognize emotion of arousal or valence. These behavior signals including text, audio and video can all contribute to predict emotion. Text feature are one of the important signals. If we use individual text feature, we will lose some emotion information and the result is not accurate. Therefore, the purpose of this thesis is to add the time-series features of the dialogue text and to fusion the interactive information between two actors, in order to promote the result of emotion recognition.
The first part of the thesis is to do word embedding in the text after segmentation using Word2vec. Then we use long short-term memory neural network (LSTM) to extract high-level features with time information. Next, we use fisher vector encoder to encode feature vector and model these by SVM to do ternary classification. The UAR of arousal and valence is 0.413 and 0.489. The result is better compared with the traditional TF-IDF feature. It shows that the time series features is helpful for the dialogue text feature. In second part, we add text interactive information between two actors through the deep learning auto-encoder model. And we get a new automatic model. The result of arousal and valence is 0.458 and 0.556. Compared with the first part, this modeling method can reflect the individual emotion state of the dialogue better, which shows that fusion interactive information can help the system get better automatic emotion classification result.

誌謝 i
中文摘要 ii
Abstract iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章緒論 1
1.1 前言 1
1.2 研究動機 2
1.3 論文架構 3
第二章資料庫 5
2.1 NNIME資料庫 5
2.2 逐字稿與情緒標籤 6
2.2.1 中文逐字稿 6
2.2.2 情緒標籤設計 7
第三章研究方法 8
3.1 向量空間模型(Vector Space Model，VSM) 8
3.2 分佈式表示 10
3.2.1 Word2vec詞向量 10
3.3 類神經網路模型 14
3.3.1 循環神經網路 15
3.3.2 長短期記憶網路(Long Short-Term Memory) 16
3.3.3 序列到序列自動編碼器 17
3.4 費舍向量編碼 18
3.5 支持向量機 21
第四章實驗設計與結果分析 23
4.1 前置實驗：文本分類之特徵 24
4.1.1 實驗設計 24
4.1.2 結果與討論 26
4.2 實驗一：LSTM提取時序性特徵 28
4.2.1 實驗設計 29
4.2.2 結果與討論 31
4.3 實驗二：對話互動特徵融合 32
4.3.1 實驗設計 32
4.3.2 結果與討論 35
第五章結論與未來展望 38
參考文獻 40

[1] S. Narayanan and P. G. Georgiou, "Behavioral signal processing: Deriving human behavioral informatics from speech and language," Proceedings of the IEEE, vol. 101, no. 5, pp. 1203-1233, 2013.
[2] Picard R W. Affective Computing MIT Press Cambridge[J]. MA Google Scholar, 1997.
[3] Chen, Hsuan-Yu, et al. "A Gaussian mixture regression approach toward modeling the affective dynamics between acoustically-derived vocal arousal score (VC-AS) and internal brain fMRI bold signal response." Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016.
[4] Tsai, Fu-Sheng, et al. "Toward Development and Evaluation of Pain Level-Rating Scale for Emergency Triage based on Vocal Characteristics and Facial Expressions." INTERSPEECH. 2016.
[5] Black M P, Katsamanis A, Baucom B R, et al. Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features[J]. Speech communication, 2013, 55(1): 1-21.
[6] D. Bone, M. S. Goodwin, M. P. Black, C.-C. Lee, K. Audhkhasi, and S. Narayanan, "Applying machine learning to facilitate autism diagnostics: pitfalls and promises," Journal of autism and developmental disorders, vol. 45, no. 5, pp. 1121-1136, 2015.
[7] D. Bone et al., "The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody," Journal of Speech, Language, and Hearing Research, vol. 57, no. 4, pp. 1162-1177, 2014.
[8] D. Wall, J. Kosmicki, T. Deluca, E. Harstad, and V. Fusaro, "Use of machine learning to shorten observation-based screening and diagnosis of autism," Translational psychiatry, vol. 2, no. 4, p. e100, 2012.
[9] Huang, Wen-Yu, et al. "Enhancement of Automatic Oral Presentation Assessment System Using Latent N-Grams Word Representation and Part-of-Speech Information." INTERSPEECH. 2016.
[10] Chou, Huang-Cheng, et al. "NNIME: The NTHU-NTUA Chinese interactive multimodal emotion corpus." Affective Computing and Intelligent Interaction (ACII), 2017 Seventh International Conference on. IEEE, 2017.
[11] Batliner, Anton, et al. "Desperately seeking emotions or: Actors, wizards, and human beings." ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion. 2000.
[12] Komatani, Kazunori, et al. "Recognition of emotional states in spoken dialogue with a robot." International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer, Berlin, Heidelberg, 2004.
[13] Li, Jun, et al. "Chinese text emotion classification based on emotion dictionary." Web Society (SWS), 2010 IEEE 2nd Symposium on. IEEE, 2010.
[14] Ojamaa, Birgitta, Päivi Kristiina Jokinen, and Kadri Muischenk. "Sentiment analysis on conversational texts." Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania. No. 109. Linköping University Electronic Press, 2015.
[15] Koutnik, Jan, et al. "A clockwork rnn." arXiv preprint arXiv:1402.3511 (2014).
[16] Jozefowicz, Rafal, Wojciech Zaremba, and Ilya Sutskever. "An empirical exploration of recurrent network architectures." International Conference on Machine Learning. 2015.
[17] Bates, Madeleine. "Models of natural language understanding." Proceedings of the National Academy of Sciences 92.22 (1995): 9977-9982.
[18] Jurafsky, Dan, and James H. Martin. Speech and language processing. Vol. 3. London:: Pearson, 2014.
[19] Harris, Zellig S. "Distributional structure." Word 10.2-3 (1954): 146-162.
[20] Goldberg, Yoav, and Omer Levy. "word2vec explained: Deriving mikolov et al.'s negative-sampling word-embedding method." arXiv preprint arXiv:1402.3722 (2014).
[21] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
[22] Salton, Gerard. "The SMART retrieval system—experiments in automatic document processing." (1971).
[23] Hinton, Geoffrey E., James L. McClelland, and David E. Rumelhart. "Distributed representations." Parallel distributed processing: Explorations in the microstructure of cognition 1.3 (1986): 77-109.
[24] Bengio, Yoshua, et al. "A neural probabilistic language model." Journal of machine learning research 3.Feb (2003): 1137-1155.
[25] Mikolov, Tomas, Wen-tau Yih, and Geoffrey Zweig. "Linguistic regularities in continuous space word representations." Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013.
[26] Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
[27] McCulloch, Warren S., and Walter Pitts. "A logical calculus of the ideas immanent in nervous activity." The bulletin of mathematical biophysics 5.4 (1943): 115-133.
[28] Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. "Learning representations by back-propagating errors." nature 323.6088 (1986): 533.
[29] Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." science313.5786 (2006): 504-507.
[30] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
[31] Hochreiter, Sepp, and Jürgen Schmidhuber. "LSTM can solve hard long time lag problems." Advances in neural information processing systems. 1997.
[32] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
[33] Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).
[34] Henderson, Matthew, Blaise Thomson, and Steve Young. "Word-based dialog state tracking with recurrent neural networks." Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). 2014.
[35] Liou, Cheng-Yuan, et al. "Autoencoder for words." Neurocomputing 139 (2014): 84-96.
[36] Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine learning 20.3 (1995): 273-297.
[37] Forman, George. "BNS feature scaling: an improved representation over tf-idf for svm text classification." Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 2008.
[38] Heisele, Bernd, Purdy Ho, and Tomaso Poggio. "Face recognition with support vector machines: Global versus component-based approach." Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on. Vol. 2. IEEE, 2001.
[39] Gaonkar, Bilwaj, and Christos Davatzikos. "Analytic estimation of statistical significance maps for support vector machine based multi-variate image analysis and classification." Neuroimage 78 (2013): 270-283.
[40] Cuingnet, Rémi, et al. "Spatial regularization of SVM for the detection of diffusion alterations associated with stroke outcome." Medical image analysis 15.5 (2011): 729-737.
[41] Chang, Chun-Min, et al. "A bootstrapped multi-view weighted Kernel fusion framework for cross-corpus integration of multimodal emotion recognition." Affective Computing and Intelligent Interaction (ACII), 2017 Seventh International Conference on. IEEE, 2017.
[42] Lin, Yun-Shao, and Chi-Chun Lee. "Deriving Dyad-Level Interaction Representation using Interlocutors Structural and Expressive Multimodal Behavior Features." Proc. Interspeech 2017 (2017): 2366-2370.
[43] Verschueren, Jef. Pragmatics as a theory of linguistic adaptation. Antwerp: International Pragmatics Association, 1987.
[44] Verschueren, Jef. Understanding pragmatics. 2000.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文