帳號:guest(18.222.21.178)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳瑋晨
作者(外文):Chen, Wei-Chen
論文名稱(中文):一個多模態連續情緒辨識系統與其應用於全域情感辨識之研究
論文名稱(外文):A Study on Automatic Multimodal Continuous Emotion Tracking and Its Application in Global Affect Recognition
指導教授(中文):李祈均
指導教授(外文):Lee, Chi-Chun
口試委員(中文):冀泰石
曹昱
李宏毅
口試委員(外文):Chi, Tai-shih
Tsao, Yu
Lee, Hung-Yi
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:103061612
出版年(民國):105
畢業學年度:105
語文別:中文英文
論文頁數:41
中文關鍵詞:行為訊號處理多模態情緒辨識薄片擷取
外文關鍵詞:Behavior Signal ProcessingMultimodal Emotion RecognitionSequence to SequenceThin-slicing
相關次數:
  • 推薦推薦:0
  • 點閱點閱:851
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
人與人之間的交流可能會藉由聲音、身體和表情來互相溝通,且溝通是非常複雜並包含有多模態(multimodal)的資訊夾雜在其中,例如:語句、語調、眼睛注視方向、手勢或肢體動作等,同時資訊又會互相影響、交互作用,相同的語句、不同的語調、不同的肢體動作,表達的意義也不同。當人們表達情緒的時候,除了全域的情感外,細部其實是一個連續性的情緒變化,每個時間點都是有不同的情緒,帶有非常豐富的資訊,而這兩種不同的情緒,全域情感及連續情緒都是非常重要的。因此本論文應用人類訊號處理,擷取身體語言與語音兩個行為特徵模態,並系統性地利用不同時間區間對於行為訊號特徵進行編碼,針對情緒三個維度激動程度、正負向與自主程度利用兩種機器學習演算法進行連續情緒辨識,分別是支持向量迴歸與Sequence to Sequence Learning,實驗結果顯示,藉由增加豐富特徵的時間資訊可以有效地幫助連續情緒辨識,結果得到了顯著性地提升,接著利用連續情緒應用於全域情感辨識,結果顯示,雖經過中間連續情緒編碼,但還是有效地幫助全域情感辨識,最後,我們更利用人類心理學Thin-slicing情感接收機制,進一步的提高全域情感辨識相關性,同時也發現一個非常有趣的事情,兩個不同的機器學習演算法連續情緒辨識結果差不多,但於全域情感確呈現截然不同的樣貌。
Human communication and interaction is a multimodal structure through voice, facial expression, body action, which are able to convey the people’s affect. In different moment of interaction, people feel different emotion. The expressions of emotion c be annotated by global emotion and continuous emotion. Continuous emotion is very complex and informative, but continuous and global emotion are both important. Therefore, we perform time encoding framework at behavior signal, body language and audio, to track continuous emotion, and focus on three dimensions of emotion which are activation, valence and dominance. During the training and testing process, we used two machine learning algorithms, support vector regression and sequence to sequence learning. Moreover, we extend our framework by two direction. First, we concentrate on continuous emotion tracking. Secondly, we use continuous emotion to predict the global affect. Comparing the previous study, continuous emotion tracking results achieve better performance of our system, and it also effectively help the global affect recognition. Interestingly, the continuous emotion tracking results bring the human perception mechanism of thin-slicing and further improve the global affect correlation.
致謝 i
摘要 ii
ABSTRACT iii
目錄 iv
圖目錄 vi
表目錄 vii
第一章 緒論 1
第二章 研究方法 4
2.1 研究架構 4
2.2 資料庫 5
2.3 情緒標記處理 7
2.4 行為特徵擷取 8
2.4.1 身體語言特徵處理 8
2.4.2 語音特徵處理 10
2.5 Support Vector Regression 10
2.6 Sequence to Sequence 11
2.6.1. Recurrent Neural Network 11
2.6.2. Bidirectional Recurrent Neural Networks 13
2.6.3. Long Short-Term Memory 15
2.6.4. Sequence to Sequence 16
第三章 實驗設計、結果與分析 20
3.1 連續情緒辨識 21
3.1.1. Support Vector Regression 辨識結果 22
3.1.2. Sequence to Sequence 辨識結果 25
3.2 全域情感辨識 28
3.3.1 Baseline 29
3.3.2 連續情緒辨識全域情感 29
3.3.3 Thin-Slicing 30
第四章 結論與討論 33
參考文獻 35
[1] D. Bone, M. S. Goodwin, M. P. Black, C. C. Lee, K. Audhkhasi, and S. Narayanan, “Applying machine learning to facilitate autism diagnostics: Pitfalls and promises,” Journal of autism and developmental disorders, vol. 45, no. 5, pp. 1121-1136, 2015.
[2] Shan-Wen Hsiao, Hung-Ching Sun, Ming-Chuan Hsieh, Ming-Hsueh Tsai, Hsin-Chih Lin, Chi-Chun Lee, “A multimodal approach for automatic assessment of school principals' oral presentation during pre-service training program,” in Proceedings of the International Speech Communication Association (Interspeech), 2015.
[3] M. P. Black, et al., “Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features,” Speech Communication, vol. 55, no. 1, pp. 1-21, 2013.
[4] Fu-Sheng Tsai, Ya-Ling Hsu, Wei-Chen Chen, Yi-Ming Weng, Chip-Jin Ng,Chi-Chun Lee, “Toward development and evaluation of pain level-rating scale for emergency triage based on vocal characteristics and facial expressions,” in Proceedings of the International Speech Communication Association (Interspeech), 2016.
[5] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, “A survey of affect recognition methods: Audio, visual, and spontaneous expressions,” IEEE transactions on pattern analysis and machine intelligence, vol. 31, no. 1,pp. 39-58, 2009.
[6] Y. Kim, H. Lee, and E. M. Provost, “Deep learning for robust feature generation in audiovisual emotion recognition,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3687-3691, 2013.
[7] C. E. Osgood, G. J. Suci, and P. H. Tannenbaum, The measurement of meaning. Urbana, IL: University of Illinois Press, 1957.
[8] S. Ebrahimi Kahou, V. Michalski, K. Konda, R. Memisevic, and C. Pal, “Recurrent neural networks for emotion recognition in video,” Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467-474, 2015.
[9] K. S. Tai, R. Socher, and C. D. Manning, “Improved semantic representations from tree-structured long short-term memory networks,” Association for Computational Linguistics (ACL), 2015.
[10] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks.” Advances in neural information processing systems, pp. 3104-3112, 2014.
[11] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
[12] O. Vinyals, and Q. Le, “A neural conversational model,” ICML Deep Learning Workshop, 2015.
[13] A. Metallinou, Z. Yang, C. C. Lee, C. Busso, S.Carnicke, and S. Narayanan, “The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations,” Language resources and evaluation, vol. 50, no. 3, pp. 497-521, 2016.
[14] R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, and M. Schröder, “'FEELTRACE': An instrument for recording perceived emotion in real time,” ISCA tutorial and research workshop (ITRW) on speech and emotion, 2000.
[15] N. Ambady, and R. Rosenthal, “Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis,” Psychological bulletin, vol. 111, no. 2, pp. 256-274, 1992.
[16] N. Ambady, and R. Rosenthal, “Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness,” Journal of personality and social psychology, vol. 64, no. 3, pp. 431-441, 1993.
[17] J. Harrigan, and R. Rosenthal, New handbook of methods in nonverbal behavior research. Oxford University Press, 2008.
[18] W. C. Lin, and C. C. Lee, “A thin-slice perception of emotion? An information theoretic-based framework to identify locally emotion-rich behavior segments for global affect recognition,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5790-5794, 2016.
[19] P. Boersma et al., “Praat, a system for doing phonetics by computer,” Glot international, vol. 5, no. 9/10, pp. 341–345, 2002.
[20] B. McFee et al., “librosa: Audio and music signal analysis in python,” Proceedings of the 14th Python in Science Conference. 2015.
[21] M. Schuster, and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673-2681, 1997.
[22] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[23] R. J. Williams, and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural computation, vol. 1, no. 2, pp. 270-280. 1989.
[24] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
[25] T. Tieleman, and G. Hinton, Lecture 6.5-rmsprop, COURSERA: Neural Networks for Machine Learning, 2012.
[26] A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks. Springer Berlin Heidelberg, 2012.
[27] A. Metallinou, A. Katsamanis, and S. Narayanan, “Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information,” Image and Vision Computing, vol. 31, no. 2, pp. 137-152, 2013.
[28] A. Kleinsmith, N. Bianchi-Berthouze, and A. Steed, “Automatic recognition of non-acted affective postures,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 41, no. 4, pp. 1027-1038, 2011.

[29] N. Sebe, I. Cohen, T. Gevers, and T. S. Huang, “Emotion recognition based on joint visual and audio cues,” 18th International Conference on Pattern Recognition (ICPR), vol. 1, pp. 1136-1139, 2006.
[30] Fabien Ringeval, et al, “Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data,” Pattern Recognition Letters, vol. 66, pp. 22-30, 2015.
[31] H. Gunes, M. Pantic, “Automatic, dimensional and continuous emotion recognition,” International Journal of Synthetic Emotions, vol. 1, no.1, 2010.
[32] N. Malandrakis, A. Potamianos, G. Evangelopoulos, and A. Zlatintsi, “A supervised approach to movie emotion tracking,” IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2376-2379, 2011.
[33] A. Metallinou, A. Katsamanis, Y. Wang, and S. Narayanan, “Tracking changes in continuous emotion states using body language and prosodic cues,” IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2288-2291, 2011.
[34] A. Hanjalic, and L. Q. Xu, “Affective video content representation and modeling,” IEEE transactions on multimedia, vol. 7, no. 1, pp. 143-154, 2005.
[35] F. Eyben, G. L. Salomão, J. Sundberg, K. R. Scherer, and B. W. Schuller, “Emotion in the singing voice—a deeperlook at acoustic features in the light of automatic classification,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2015, no., pp. 1-9, 2015.
[36] F. Eyben, M. Wöllmer, and B. Schuller, “Opensmile: the munich versatile and fast open-source audio feature extractor,” Proceedings of the 18th ACM international conference on Multimedia, pp. 1459-1462, 2010.
[37] C. H. Wu, Z. J. Chuang, and Y. C. Lin, “Emotion recognition from text using semantic labels and separable mixture models,” ACM transactions on Asian language information processing (TALIP), vol. 5, no. 2, pp. 165-183, 2006.
[38] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain,” Psychological review, vol. 65, no. 6, pp. 386-408, 1958.
[39] D. E. Rumelhart, G. E. Hinton and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp.533-536, 1986.
[40] G. E. Hinton, and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504-507, 2006.
[41] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157-166, 1994.
[42] D. Ververidis, & C. Kotropoulos, “Emotional speech recognition: Resources, features, and methods. Speech communication,” vol. 48, no. 9, pp. 1162-1181, 2006.
[43] M. A. Nicolaou, H. Gunes, and M. Pantic, “Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space,” IEEE Transactions on Affective Computing, vol. 2, no. 2, pp. 92-105, 2011.
[44] M. Wöllmer, M. Kaiser, F. Eyben, B. Schuller, and G. Rigoll, “LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework,” Image and Vision Computing, vol. 31, no. 2, pp. 153-163, 2013.
[45] M. Wöllmer, F. Weninger, T. Knaup, B. Schuller, C. Sun, K. Sagae, and L. P. Morency, “Youtube movie reviews: Sentiment analysis in an audio-visual context,” IEEE Intelligent Systems, vol. 28, no. 3, pp. 46-53, 2013.
(此全文限內部瀏覽)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top

相關論文

1. 一個利用人類Thin-Slice情緒感知特性所建構而成之全時情緒辨識模型新框架
2. 透過語音特徵建構基於堆疊稀疏自編碼器演算法之婚姻治療中夫妻互動行為量表自動化評分系統
3. 基於健保資料預測中風之研究並以Hadoop作為一種快速擷取特徵工具
4. 應用多任務與多模態融合技術於候用校長演講自動評分系統之建構
5. 基於多模態主動式學習法進行樣本與標記之間的關係分析於候用校長評鑑之自動化評分系統建置
6. 透過結合fMRI大腦血氧濃度相依訊號以改善語音情緒辨識系統
7. 結合fMRI之迴旋積類神經網路多層次特徵 用以改善語音情緒辨識系統
8. 針對實體化交談介面開發基於行為衡量方法於自閉症小孩之評估系統
9. 整合文本多層次表達與嵌入演講屬性之表徵學習於強健候用校長演講自動化評分系統
10. 利用聯合因素分析研究大腦磁振神經影像之時間效應以改善情緒辨識系統
11. 利用LSTM演算法基於自閉症診斷觀察量表訪談建置辨識自閉症小孩之評估系統
12. 利用多模態模型混合CNN和LSTM影音特徵以自動化偵測急診病患疼痛程度
13. 以雙向長短期記憶網路架構混和多時間粒度文字模態改善婚 姻治療自動化行為評分系統
14. 透過表演逐字稿之互動特徵以改善中文戲劇表演資料庫情緒辨識系統
15. 基於大腦靜息態迴旋積自編碼的fMRI特徵擷取器
 
* *