作者(中文):張 雨
作者(外文):Zhang, Yu
論文名稱(外文):Improving Chinese Performance Corpus Emotion Recognition System by Using Transcripts Interactive Features
指導教授(外文):Lee, Chi-Chun
口試委員(外文):Ku, Lun-Wei
Tsao, Yu
Lai, Ying-Hui
外文關鍵詞:behavior signal processingemotion recognitionNLPSVMlong short-term memory
本文研究內容第一部分使用詞向量(Word2vec)對分詞之後的原始逐字稿進行詞嵌入(Word Embedding),並搭配長短時記憶網路(LSTM)架構提取含有時間資訊的文字模態之高階特徵,然後使用費舍編碼(Fisher Vector Encoding)整合特征向量,最後使用支持向量機進行學習建模進行三元分類,Arousal和Valence分別得到0.413和0.489的未加權平均查全率(UAR),相較於使用傳統的TF-IDF特徵與向量化方法特徵有所提升,說明增加的時間序列特徵對於對話文字特徵有一定的增益性。第二部分我們通過深度學習自動編碼器模型加入了兩位演員之間的文字互動訊息,由此得到新的自動化模型Arousal和Valence辨識結果分別達到了0.458和0.556。此建模方法相比第一部分能更好地反應兩人對話之間個人的情緒狀態,藉由此說明融合互動資訊可以幫助本系統取得更好的自動化情緒分類效果。
Rencent years, researcher often use human behavior signal to extract machine learning features, in order to recognize emotion of arousal or valence. These behavior signals including text, audio and video can all contribute to predict emotion. Text feature are one of the important signals. If we use individual text feature, we will lose some emotion information and the result is not accurate. Therefore, the purpose of this thesis is to add the time-series features of the dialogue text and to fusion the interactive information between two actors, in order to promote the result of emotion recognition.
The first part of the thesis is to do word embedding in the text after segmentation using Word2vec. Then we use long short-term memory neural network (LSTM) to extract high-level features with time information. Next, we use fisher vector encoder to encode feature vector and model these by SVM to do ternary classification. The UAR of arousal and valence is 0.413 and 0.489. The result is better compared with the traditional TF-IDF feature. It shows that the time series features is helpful for the dialogue text feature. In second part, we add text interactive information between two actors through the deep learning auto-encoder model. And we get a new automatic model. The result of arousal and valence is 0.458 and 0.556. Compared with the first part, this modeling method can reflect the individual emotion state of the dialogue better, which shows that fusion interactive information can help the system get better automatic emotion classification result.
誌謝 i
中文摘要 ii
Abstract iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章 緒論 1
1.1 前言 1
1.2 研究動機 2
1.3 論文架構 3
第二章 資料庫 5
2.1 NNIME資料庫 5
2.2 逐字稿與情緒標籤 6
2.2.1 中文逐字稿 6
2.2.2 情緒標籤設計 7
第三章 研究方法 8
3.1 向量空間模型(Vector Space Model,VSM) 8
3.2 分佈式表示 10
3.2.1 Word2vec詞向量 10
3.3 類神經網路模型 14
3.3.1 循環神經網路 15
3.3.2 長短期記憶網路(Long Short-Term Memory) 16
3.3.3 序列到序列自動編碼器 17
3.4 費舍向量編碼 18
3.5 支持向量機 21
第四章 實驗設計與結果分析 23
4.1 前置實驗:文本分類之特徵 24
4.1.1 實驗設計 24
4.1.2 結果與討論 26
4.2 實驗一:LSTM提取時序性特徵 28
4.2.1 實驗設計 29
4.2.2 結果與討論 31
4.3 實驗二:對話互動特徵融合 32
4.3.1 實驗設計 32
4.3.2 結果與討論 35
第五章 結論與未來展望 38
參考文獻 40

