帳號:guest(18.219.212.91)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳主霖
作者(外文):Chen, Chu-Ling
論文名稱(中文):以標籤共同偏移估計與標籤信賴先驗之連續表情識別研究
論文名稱(外文):Continuous Emotion Recognition by Estimating Common Label Bias with Label Confidence Prior
指導教授(中文):許秋婷
指導教授(外文):Hsu, Chiou-Ting
口試委員(中文):王聖智
李祈均
口試委員(外文):Wang, Sheng-Jyh
Lee, Chi-Chun
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:102062581
出版年(民國):105
畢業學年度:105
語文別:英文
論文頁數:47
中文關鍵詞:連續情緒識別機器學習人臉表情識別
外文關鍵詞:Continuous Emotion RecognitionMachine LearningFacial Expression Recognition
相關次數:
  • 推薦推薦:0
  • 點閱點閱:223
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
連續情緒識別的目標是從聲音與影像序列識別人類情緒。連續情緒標籤常包含雜訊,因為標籤標示者無法準確地估計連續情緒並且及時地觀看聲音與影像序列。現有的連續情緒識別方法大多透過人為方法去除標籤雜訊並補償真實情緒與機器學習模型之間的差異。對於情緒識別的自動化來說,這些人為方法並沒有助益。本論文之目的在於自動地去除標籤雜訊並補償標示者與機器之間對於情緒理解的差距。本論文提出一個同時識別情緒並估計共同標籤偏差的聯合優化模型;本論文以特徵與標籤之關係進行雜訊測量,並以深度情緒特徵強化此模型。本論文提出之方法以最少的人為干預達成情緒識別、標籤雜訊消除與標籤偏差補償。實驗結果顯示本論文方法在公正比較之下超越各種模型,並在AVEC2012資料庫上與現今最優之連續情緒識別方法並駕齊驅。
Continuous emotion recognition aims to recognize human emotion from audio-visual sequences. Continuous emotion labels are noisy because the annotators cannot accurately estimate the continuous emotion while watching the audio-visual sequences in real-time. Most existing methods exclude the label noises and bridge the gap between real emotion and their model with manual and hand-crafted designs. However, these manual designs are not beneficial to the automation of emotion understanding. The purpose of this work is to purify the noisy emotion labels and also to bridge the gap between emotion annotators and emotion regressors automatically. We propose a jointly-optimized model of emotion regressor and common label bias estimation with a label noise measurement by feature-label relationships. We also empower this model with deep emotion features. The proposed method is capable of jointly emotion recognition, label purification, and label bias compensation under minimal human interventions. The results of our study outperform various models under fair comparisons, and are comparable to the State-of-the-Arts on the well-known AVEC 2012 dataset.
中文摘要 2
Abstract 3
1. Introduction 6
1.1 Background 6
1.2 Common Steps of CER 7
1.3 Issues of Continuous Emotion Datasets 7
1.4 Organization 8
2. Related Work and Motivation 9
2.1 The State-of-the-Art Methods 9
2.1.1 Multi-scale Dynamic Cues 9
2.1.2 3D Shape Model 10
2.1.3 Bayesian Fusion 10
2.2 Solutions of Calibration Stage Issue 11
2.3 Motivation 12
3. Proposed Method 14
3.1 Overview 14
3.1.1 Preliminaries 14
3.1.2 Assumptions 15
3.1.3 Single-modal Framework 17
3.2 Basic temporal bias estimation model 17
3.2.1 Loss Functions and Solutions 19
3.2.2 Robustness of the temporal bias estimation model 20
3.3 Label confidence prior 21
3.4 Prediction 24
3.4.1 Single modality prediction 24
3.4.2 Fusion of predictions 25
3.5 Robust Emotion Features 26
3.5.1 Visual features 26
3.5.2 Audio features 27
4. Experimental Results 30
4.1 Datasets 30
4.2 Labels and Evaluation Metrics 31
4.3 Feature Settings 32
4.4 Model Evaluations 33
4.5 Comparison with existing methods 35
5. Discussions 39
5.1 Limitations 39
5.1.1 Formulation and Convergence 39
5.1.2 Failure cases 40
5.2 Measurement of correlation coefficient 40
6. Conclusions 42
7. References 43
[1] R. Cowie, and R. R. Cornelius, “Describing the emotional states that are expressed in speech,” Speech Communication, vol. 40, no. 1-2, pp. 5-32, 2003.
[2] L. F. Barrett, “Discrete Emotions or Dimensions? The Role of Valence Focus and Arousal Focus,” Cognition and Emotion, vol. 12, no 4, pp. 579-599, 1998.
[3] B. Schuller, M. Valster, F. Eyben, R. Cowie, and M. Pantic, “AVEC 2012: the continuous audio/visual emotion challenge,” In Proc. of the 14th ACM International Conference on Multimodal interaction, pp. 449-456, 2012.
[4] M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, and M. Pantic, “AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge,” In Proc. of the 4th International Workshop on Audio/Visual Emotion Challenge, pp. 3-10, 2014.
[5] P. Ekman, and W. Friesen, “Emotion in the Human Face,” Prentice Hall, New Jersey, 1975.
[6] P. Ekman, and W. Friesen, “Facial Action Coding System: A Technique for the Measurement of Facial Movement,” Consulting Psychologists Press, Palo Alto, 1978.
[7] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression,” In Proc. of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 94-101, 2010.
[8] A. C. Cruz, B. Bhanu, and N. S. Thakoor, “Vision and Attention Theory Based Sampling for Continuous Facial Emotion Recognition,” IEEE Transactions on Affective Computing, vol.5, no.4, pp.418-431, 2014.
[9] J. Nicolle, V. Rapp, K. Bailly, L. Prevost, and M. Chetouani, “Robust Continuous Prediction of Human Emotions Using Multiscale Dynamic Cues,” In Proc. of the 14th ACM International Conference on Multimodal interaction, pp. 501-508, 2012.
[10] D. Ozkan, S. Scherer, and L.-P. Morency, “Step-wise emotion recognition using concatenated-HMM,” In Proc. of the 14th ACM International Conference on Multimodal interaction, pp. 477-484, 2012.
[11] C. Soladié, H. Salam, C. Pelachaud, N. Stoiber, and R. Séguier, “A Multimodal Fuzzy Inference System Using a Continuous Facial Expression Representation for Emotion Detection,” In Proc. of the 14th ACM International Conference on Multimodal interaction, pp. 493-500, 2012.
[12] A. Savran, H. Cao, A. Nenkova, and R. Verma, “Temporal Bayesian Fusion for Affect Sensing: Combining Video, Audio, and Lexical Modalities,” IEEE Transactions on Cybernetics, vol. 45, no. 9, pp. 1927-1941, 2015.
[13] M. Kächele, M. Schels, and F. Schwenker, “Inferring Depression and Affect from Application Dependent Meta Knowledge,” In Proc. of the 4th International Workshop on Audio/Visual Emotion Challenge, pp. 41-48, 2014.
[14] H. Chen, J. Li, F. Zhang, Y. Li, and H. Wang, “3D model-based continuous emotion recognition,” In Proc. of the 2015 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1836-1845, 2015.
[15] T. Baltrušaitis, N. Banda, and P. Robinson, “Dimensional affect recognition using Continuous Conditional Random Fields,” In Proc. of the 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1-8, 2013.
[16] S. Kaltwang, S. Todorovic, and M. Pantic, “Doubly Sparse Relevance Vector Machine for Continuous Facial Behavior Estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 9, pp. 1748-1761, 2016.
[17] I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, and D.-H. Lee, et al, “Challenges in representation learning: a report on three machine learning contests,” In Proc. of the 2013 ICML Workshop on Representation Learning, 2013.
[18] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep Learning Face Attributes in the Wild,” In Proc. of the 2015 IEEE International Conference on Computer Vision, pp. 3730-3738, 2015.
[19] K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556, 2014.
[20] O.M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep Face Recognition,” In Proc. of the 26th British Machine Vision Conference, pp. 41.1-41.12, 2015.
[21] Y. Tang, “Deep learning using linear support vector machines,” In Proc. of the 2013 ICML Workshop on Representation Learning, 2013.
[22] https://github.com/senecaur/caffe-rta
[23] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional Architecture for Fast Feature Embedding,” In Proc. of the 14th ACM International Conference on Multimedia, pp. 675-678, 2014.
[24] Asthana, S. Zafeiriou, S. Cheng, and M. Pantic, “Incremental Face Alignment in the Wild,” In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1859-1866, 2014.
[25] F. Eyben, M. Wöllmer, and B. Schuller, “openSMILE – The Munich Versatile and Fast Open-Source Audio Feature Extractor”, In Proc. of the 18th ACM Multimedia, pp. 1459-1462, 2010.
[26] G. Mckeown, M. F. Valstar, R. Cowie, M. Pantic, and M. Schroeder, “The SEMAINE database: Annotated multimodal records of emotionally coloured conversations between a person and a limited agent,” IEEE Transactions on Affective Computing, vol. 3, no.1, pp. 5-17, 2012.
[27] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-Scale Video Classification with Convolutional Neural Networks,” In Proc. of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725-1732, 2014.
[28] S. Chen, and Q. Jin, “Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks,” In Proc. of the 5th International Workshop on Audio/Visual Emotion Challenge, pp. 49-56, 2015.
[29] J. Domke, “Learnin Graphical Model Parameters with Approximate Marginal Inference,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 10, pp. 2454-2467, 2013.
[30] P. Fewzee, and F. Karray, “Continuous Emotion Recognition: Another Look at the Regression Problem,” In Proc. of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 197-202, 2013.
[31] G. Andrew, R. Arora, J. Bilmes, and K. Livescu, “Deep canonical correlation analysis,” In Proc. of the 2013 International Conference on Machine Learning, pp. 1247-1255, 2013.
[32] M. Nicolaou, S. Zafeiriou, and M. Pantic, “Correlated-Spaces Regression for Learning Continuous Emotion Dimensions,” In Proc. of the 21th ACM International Conference on Multimedia, pp. 773-776, 2013.
[33] H. Meng, N. Bianchi-Berthouze, Y. Deng, J. Cheng, and J. P. Cosmas, “Time-Delay Neural Network for Continuous Emotional Dimension Prediction From Facial Expression Sequences,” IEEE Transactions on Cybernetics, vol. 46, no. 4, pp. 916-929, 2016.
[34] L. Chao, J. Tao, M. Yang, Y. Li, and Z. Wen, “Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video,” In Proc. of the 4th International Workshop on Audio/Visual Emotion Challenge, pp. 11-18, 2014.
[35] R. Gupta, N. Malandrakis, B. Xiao, T. Guha, M. V. Segbroeck, M. Black, A. Potamianos, and S. Narayanan, “Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions,” In Proc. of the 4th International Workshop on Audio/Visual Emotion Challenge, pp. 33-40, 2014.
[36] S. Mariooryad, and C. Busso, “Correcting Time-Continuous Emotional Labels by Modeling the Reaction Lag of Evaluators,” IEEE Transactions on Affective Computing, vol. 6, no. 2, pp. 97-108, 2015.
[37] Y. Song, L.-P. Morency, and R. Davis, “Learning a sparse codebook of facial and body microexpressions for emotion recognition,” In Proc. of the 15th ACM on International conference on multimodal interaction, pp. 237-244, 2013.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *