帳號:guest(3.135.195.168)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):林怡均
作者(外文):Lin, Yi-Chun.
論文名稱(中文):應用於連續情緒識別之時空特徵學習
論文名稱(外文):Disentangled and Spatiotemporal Feature Learning for Continuous Emotion Recognition
指導教授(中文):許秋婷
指導教授(外文):Hsu, Chiou-Ting
口試委員(中文):王聖智
彭文孝
口試委員(外文):Wang, Sheng-Jyh
Peng, Wen-Hsiao
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系所
學號:105062518
出版年(民國):107
畢業學年度:107
語文別:英文
論文頁數:39
中文關鍵詞:連續情感識別時空特徵學習對抗式網路學習捲積神經網路長短期記憶
外文關鍵詞:Continuous Emotion RecognitionConvolutional Neural NetworkDisentangled feature learning GANLong Short Term MemoryEmotion dimension
相關次數:
  • 推薦推薦:0
  • 點閱點閱:252
  • 評分評分:*****
  • 下載下載:14
  • 收藏收藏:0
連續臉部情緒識別的目標是從影像序列中的每幀辨識人的情感。在論文中,我們會著重在分析兩個心理學上廣泛使用的情緒象限:arousal和valence。我們從三個方面描述情緒的變化:全域外觀、局部外觀、臉部動作資訊。首先,我們先離線訓練三個編碼器以分別抓取這三個方面的資訊。我們設計了三個生成對抗式網路,全域和局部分解特徵學習生成對抗式網路、運動生成對抗式網路,分別去學習有鑑別力的特徵。接著,我們提出了一個三通道的卷積神經網絡,並採用這三個編碼器提取特徵。三通道的卷積神經網絡目的在於結合全域外觀、局部外觀和運動資訊以處理臉部外觀和情緒標註之間不一致的問題。此外,我們提出了時間融合網絡,來處理時間延遲標籤的問題,以模擬長期的情緒變化。實驗結果顯示,我們的方法不僅能即時預測,即平均每秒預測42.9幀,也能在只使用影像資訊的條件下,在AVEC2012和RECOLA資料庫上與其他連續情緒識別的方法相互競爭。
The goal of continuous facial emotion recognition is to recognize the emotional values of each frame in video sequences. In this thesis, we focus on predicting the two widely used dimensional emotions: arousal and valence. We describe the emotional changes from three aspects: global appearance, local appearance, and motion information. First, we propose to off-line train three encoders to learn discriminative features by three GANs, Global and Local Disentangled Feature Learning GANs, and Motion GAN. Next, we adopt the extracted features and develop a three-stream ConvNet to incorporate appearance features with motion features. Third, we propose a temporal fusion network to model long-term emotional evolution. Experimental results show that our approach not only processes in real time, i.e., 42.9 frames per second, but also achieves comparable results with previous work on AVEC2012 dataset and the RECOLA dataset when using the visual modality alone.
中文摘要 I
Abstract II
1.Introduction 1
2.Related Work 4
2.1 Discriminative Feature Learning 4
2.2 Temporal Delayed Labels Modeling 5
3.Proposed Method 8
3.1 Disentangled Feature Learning GANs 9
3.1.1 GDR-GAN 10
3.1.2 LDR-GAN 14
3.2 M-GAN 15
3.3 Three-stream ConvNet 16
3.4 Temporal Fusion LSTM 18
4.Experimental Results 20
4.1 Datasets 20
4.2 Implementation Details 21
4.3 Experimental Results 23
4.4 Discussion 31
5.Conclusions 34
6.References 35
[1] L. Tran, X. Yin, and X. Liu, “Disentangled Representation Learning GAN for Pose-Invariant Face Recognition," in Proc. IEEE Computer Vision and Pattern Recognition (CVPR 2017 Oral).
[2] J. Posner, JA. Russell, and BS. Peterson, “The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology," Development and Psychopathology, 17(3):715-734, 2005.
[3] S. H. WANG, and C. T. HSU, “AST-Net: An Attribute-based Siamese Temporal Network for Real-Time Emotion Recognition," in Proc. BMVC, London, Sep. 2017.
[4] J. Bao, D. Chen, F. Wen, H. Li, and G. Hua, “Toward Open-set Indentity Preserving Face Synthesis," in Proc. IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[5] H. Ding, K. Sricharan, and R. Chellappa, “ExprGAN: Facial Expression Editing with Controllable Expression Intensity," in Proc. Association for the Advancement of Artificial Intelligence. 2018.
[6] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazırbas, and V. Golkov, “FlowNet: Learning Optical Flow with Convolutional Networks," in Proc. IEEE International Conference on Computer Vision. 2015.
[7] P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks," in Proc. IEEE Conference on Computer Vision and Pattern Recognition. 2017.
[8] O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep Face Recognition," in Proc. Proc. British Machine Vision Conference, 2015.
[9] J. Nicolle, V. Rapp, K. Bailly, L. Prevost, and M. Chetouani, “Robust continuous prediction of human emotions using multiscale dynamic cues," In Proc. 14th ACM International Conference on Multimodal interaction, pages 501–508. ACM, 2012.
[10] H. Chen, J. Li, F. Zhang, Y. Li, and H. Wang, “3d model-based continuous emotion recognition," In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 1836–1845, 2015.
[11] X. Sun, M. Lv, C. Quan, and F. Ren, “Improved Facial Expression Recognition Method Based on ROI Deep Convolutional Neutral Network,” in IEEE Affective Computing and Intelligent Interaction (ACII) pp. 256 - 261, 2017.
[12] C. Shan, S. Gong, and P. W. McOwan, “Facial expression recognition based on local binary patterns: A comprehensive study,” Image Vis. Comput., vol. 27, no. 6, pp. 803–816, 2009.
[13] T. Jabid, M. H. Kabir, and O. Chae, “Facial expression recognition using local directional pattern (LDP),” in Proc. IEEE Int Conf. Image Process., Sep. 2010, pp. 1605–1608.
[14] A. Ramirez Rivera, R. Castillo, and O. Chae, “Local directional number pattern for face analysis: Face and expression recognition,” in IEEE Trans. Image Process., vol. 22, no. 5, pp. 1740–1752, May 2013.
[15] J. B. Tenenbaum and W. T. Freeman, “Separating style and content,” In Advances in Neural Information Processing Systems, pages 662–668, 1997.
[16] B. Cheung, J. A. Livezey, A. K. Bansal, and B. A. Olshausen, “Discovering hidden factors of variation in deep networks,” arXiv preprint arXiv:1412.6583, 2014.
[17] D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, “Semi-supervised learning with deep generative models,” In Advances in Neural Information Processing Systems, pages 3581–3589, 2014
[18] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Adversarial autoencoders,” arXiv preprint arXiv:1511.05644, 2015
[19] M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun, “Disentangling factors of variation in deep representation using adversarial training,” In Advances in Neural Information Processing Systems, pages 5040–5048, 2016
[20] L. Tran, X. Yin, and X. Liu, “Disentangled representation learning GAN for pose-invariant face recognition,” In Proc. IEEE Conference on Computer Vision and Pattern Recognition, volume 4, page 7, 2017
[21] B. Schuller, M. Valster, F. Eyben, R. Cowie, and M. Pantic, “Avec 2012: the continuous audio/visual emotion challenge,” In Proc. 14th ACM International Conference on Multimodal interaction, pages 449–456. ACM, 2012
[22] F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, “Introducing the recola multimodal corpus of remote collaborative and affective interactions,” In Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, pages 1–8. IEEE, 2013.
[23] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih and Z. Ambadar, “The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unitand emotion-specified expression," In Proc. IEEE Conference on Computer Vision and Pattern Recognition - Workshops, pages 94–101, 2010.
[24] I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, and Y. Zhou, “Challenges in representation learning: A report on three machine learning contests,” In International Conference on Neural Information Processing, pages 117– 124. Springer, 2013.
[25] G. Zhao, X. Huang, M. Taini, S. Z. Li, and M. Pietikainen, “Facial expression recognition from near-infrared videos,” in Image and Vision Computing 29(9):607–619, 2011.
[26] D. Ozkan, S. Scherer, and L. P. Morency, “Step-wise emotion recognition using concatenated-hmm,” In Proc. 14th ACM International Conference on Multimodal interaction, pages 477–484. ACM, 2012.
[27] C. Soladié, H. Salam, C. Pelachaud, N. Stoiber, and R. Séguier, “A multimodal fuzzy inference system using a continuous facial expression representation for emotion detection,” In Proc. 14th ACM International Conference on Multimodal Interaction, pages 493–500. ACM, 2012.
[28] J. Nicolle, V. Rapp, K. Bailly, L. Prevost, and M. Chetouani, “Robust continuous prediction of human emotions using multiscale dynamic cues,” In Proc. 14th ACM International Conference on Multimodal interaction, pages 501–508. ACM, 2012.
[29] M. A. Nicolaou, S. Zafeiriou, and M. Pantic, “Correlated-spaces regression for learning continuous emotion dimensions,” In Proc. 21st ACM International Conference on Multimedia, pages 773–776. ACM, 2013.
[30] T. Baltrušaitis, N. Banda, and P. Robinson, “Dimensional affect recognition using continuous conditional random fields,” In Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, pages 1–8. IEEE, 2013.
[31] H. Meng, N. Bianchi-Berthouze, Y. Deng, J. Cheng, and J. P. Cosmas, “Time-delay neural network for continuous emotional dimension prediction from facial expression sequences,” IEEE transactions on cybernetics, 46(4):916–929, 2016.
[32] A. Savran, H. Cao, A. Nenkova, and R. Verma, “Temporal bayesian fusion for affect sensing: Combining video, audio, and lexical modalities,” IEEE Transactions on Cybernetics, 45(9):1927–1941, 2015.
[33] F. Ringeval, B. Schuller, M. Valstar, S. Jaiswal, E. Marchi, D. Lalanne, R. Cowie, and M. Pantic, “AV+EC 2015: The first affect recognition challenge bridging across audio, video, and physiological data,” In Proc. 5th International Workshop on Audio/Visual Emotion Challenge, pages 3–8. ACM, 2015.
[34] P. Cardinal, N. Dehak, AL. Koerich, J. Alam, and P. Boucher, “Ets system for avec 2015 challenge,” In Proc. 5th International Workshop on Audio/Visual Emotion Challenge, pages 17–23. ACM, 2015.
[35] L. Chao, J. Tao, M. Yang, Y. Li, and Z.Wen, “Long short term memory recurrent neural network based multimodal dimensional emotion recognition,” In Proc. 5th International Workshop on Audio/Visual Emotion Challenge, pages 65–72. ACM, 2015.
[36] S. Chen and Q. Jin, “Multi-modal dimensional emotion recognition using recurrent neural networks,” In Proc. 5th International Workshop on Audio/Visual Emotion Challenge, pages 49–56. ACM, 2015.
[37] M. Kächele, P. Thiam, G. Palm, F. Schwenker, and M. Schels, “Ensemble methods for continuous affect recognition: Multi-modality, temporality, and challenges,” In Proc. 5th International Workshop on Audio/Visual Emotion Challenge, pages 9–16. ACM, 2015.
[38] F. Ringeval, F. Eyben, E. Kroupi, A. Yuce, J.P. Thiran, T. Ebrahimi, D. Lalanne, and B. Schuller, “Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data,” Pattern Recognition Letters, 66:22–30, 2015.
[39] Z. Zhang, Y. Song, and H. Qi, “Age progression/regression by conditional adversarial autoencoder,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition. 2017.
[40] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, 9(8):1735–1780, 1997.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *