透過聲階加權矩陣強化情緒表達之模型__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.61) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	林文智
作者(外文):	Robles Coulter, Luis M.
論文名稱(中文):	透過聲階加權矩陣強化情緒表達之模型
論文名稱(外文):	Enriching Pattern-based Emotion Representations through Acoustic-level Weighting Matrices
指導教授(中文):	陳宜欣
指導教授(外文):	Chen, Yi-Shin
口試委員(中文):	彭文志陳朝欽
口試委員(外文):	Peng, Wen-Chih Chen, Chaur-Chin
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊系統與應用研究所
學號:	105065431
出版年(民國):	107
畢業學年度:	106
語文別:	英文
論文頁數:	26
中文關鍵詞:	情緒識別、多模態識別、卷積神經網絡
外文關鍵詞:	emotion recognition、multimodal recognition、CNN
相關次數:	推薦:0 點閱:555 評分: 下載:9 收藏:0

在本論文中，我們提出一個創新的方法，藉由融合聲階特徵和從圖像方法擷取出的文字模型來識別語句中的情緒。將音訊以文字為基準切割後，運用聲音訊息擷取工具- Librosa來取得聲階特徵。這些聲階特徵會和從情緒語意分類器取出的文字模型結合。而這些特徵的組合被用來訓練一個簡易的向量模型並以CNN做情緒的分類。我們利用USC-IEMOCAP提供的文本來識別四種情緒：生氣、開心、難過以及中立來展示我們方法的成效。實驗結果顯示，結合聲階特徵和情緒模型來進行情緒辨識可以得到85％的正確率，比起基準值，在精確率提高了11％而召回率也提高了6％。在相同的資料中，實驗結果表現得比之前的研究還要好。

In this thesis, we propose an innovative approach for sentence-level emotion recognition by using combinations of acoustic features with textual patterns extracted from transcripts using a graph-based pattern extraction model. The acoustic features are extracted from word-level segments using Librosa, a tool for acoustic information retrieval. These acoustic features are fused with textual patterns extracted from emotion-rich syntactic patterns. The combined feature set is used to train a simple naive vector model and a convolutional neural network (CNN) for emotion classification. We demonstrate the efficacy of our approach by performing a four-way (anger, happiness, sadness, neutral) emotion recognition on the University of Southern California's Interactive Emotional Motion Capture (USC-IEMOCAP) corpus. Our experiments show that the fusion of acoustic features and emotional patterns delivers an emotion recognition accuracy of 85\% and an improvement in precision of 11% from our baseline (73%). In terms of recall, there was a 7% improvement from our baseline (67%), outperforming the previous best result on this dataset.

1 Introduction.......................................1
2 Related Work.......................................4
2.1 Overview of Emotional Representations............4
2.2 Speech-based Classification......................6
2.3 Acoustic Features................................6
3 Methodology........................................7
3.1 Emotion Pattern Scoring..........................8
3.2 Acoustic-based Weighting.........................9
3.3 Models...........................................11
3.3.1 Acoustic Channels Convolutional Neural Network.11
3.3.2 Model Variations...............................12
4 Experiments........................................13
4.1 Experimental Setup...............................13
4.2 Experimental Results.............................14
4.2.1 Analysis on Acoustic Influence.................16
4.2.2 What have the Acoustic weights learned?........16
5 Conclusions and Future Works.......................19
6 References.........................................21

[1] C. Argueta, E. Saravia, and Y.S. Chen. Unsupervised graph-based patterns extraction for emotion classification. InProceedings of the 2015 IEEE/ACM international conference on Advances in Social Networks Analysis and Mining 2015. ACM, 2015. [2] M. Asgari, G. Kiss, J.P.H. van Santen, I. Shafran, and X. Song. Automatic measurement of affective valence and arousal in speech. InICASSP, 2014. [3] R.G. Bachu, S. Kopparthi, B. Adapa, and B.D. Barkana. Separation of voiced and un-voiced using zero crossing rate and energy of the speech signal. InAmerican Societyfor Engineering Education (ASEE) Zone Conference Proceedings, 2008. [4] Y.L. Boureau, J. Ponce, and Y. LeCun. A theoretical analysis of feature pooling invisual recognition. InProceedings of the 27th international conference on machinelearning (ICML-10), 2010. [5] C. Busso, M. Bulut, C.C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J.N. Chang, S. Lee,and S.S. Narayanan. Iemocap: Interactive emotional dyadic motion capture database.Language resources and evaluation, 42(4), 2008. [6] F. Chollet et al. Keras, 2015. [7] F. Gouyon, F. Pachet, O. Delerue, et al. On the use of zero-crossing rate for an application of classification of percussive sounds. In Proceedings of the COST G-6 conference on Digital Audio Effects (DAFX-00), Verona, Italy, 2000. [8] J. Grekow. Audio features dedicated to the detection of arousal and valence in music recordings. In INnovations in Intelligent SysTems and Applications (INISTA), 2017 IEEE International Conference on. IEEE, 2017. [9] K. Han, D. Yu, and I. Tashev. Speech emotion recognition using deep neural network and extreme learning machine. In Fifteenth annual conference of the international speech communication association, 2014. [10] K.M.Han,T.Zin,andH.M.Tun.Extractionofaudiofeaturesforemotionrecognition system based on music. International Journal of Scientific & Technology Research, 4(8), 2015. [11] S. Haq, P.J.B. Jackson, and J. Edge. Speaker-dependent audio-visual emotion recog- nition. In AVSP, pages 53–58, 2009. [12] M.R. Hasan, M. Jamil, M. Rahman, et al. Speaker identification using mel frequency cepstral coefficients. variations, 1(4), 2004. [13] X.D. Huang, F. Alleva, H.W. Hon, M.Y. Hwang, K.F. Lee, and R. Rosenfeld. The sphinx-ii speech recognition system: an overview. Computer Speech & Language, 7(2), 1993. [14] Q. Jin, C.X. Li, S.Z. Chen, and H.M. Wu. Speech emotion recognition with acoustic and lexical features. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015. [15] Y. Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014. [16] E. Lakomkin, M.A. Zamani, C. Weber, S. Magg, and S. Wermter. Emorl: Continuous acoustic emotion classification using deep reinforcement learning. arXiv preprint arXiv:1804.04053, 2018. [17] J. Lee and I. Tashev. High-level feature representation using recurrent neural network for speech emotion recognition. 2015. [18] S.R. Livingstone and F.A. Russo. The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. PloS one, 13(5), 2018. [19] B.McFee,C.Raffel,D.Liang,D.P.W.Ellis,M.McVicar,E.Battenberg,andO.Nieto. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, 2015. [20] A. Metallinou, S. Lee, and S. Narayanan. Decision level combination of multiple modalities for recognition and analysis of emotional expression. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010. [21] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word represen- tations in vector space. arXiv preprint arXiv:1301.3781, 2013. [22] V. Nair and G.E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), 2010. [23] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct), 2011. [24] J.W. Pennebaker, R.J. Booth, and M.E. Francis. Linguistic inquiry and word count: Liwc [computer software]. Austin, TX: liwc. net, 2007. [25] J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word repre- sentation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014. [26] E. Riloff and J. Wiebe. Learning extraction patterns for subjective expressions. In Proceedings of the 2003 conference on Empirical methods in natural language pro- cessing. Association for Computational Linguistics, 2003. [27] V. Rozgic, S. Ananthakrishnan, S. Saleem, R. Kumar, and R. Prasad. Ensemble of svm trees for multimodal emotion recognition. In Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific. IEEE, 2012. [28] E. Saravia, C.H. Chang, R.J. De Lorenzo, and Y.S. Chen. Midas: Mental illness detection and analysis via social media. In Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on. IEEE, 2016. [29] E. Saravia, H.C.T. Liu, and Y.S. Chen. Deepemo: Learning and enriching pattern- based emotion representations. arXiv preprint arXiv:1804.08847, 2018. [30] V. Sintsova, C.C. Musat, and P. Pu. Fine-grained emotion recognition in olympic tweets based on human computation. In 4th Workshop on computational approaches to subjectivity, sentiment and social media analysis, number EPFL-CONF-197185, 2013. [31] N.Srivastava,G.Hinton,A.Krizhevsky,I.Sutskever, and R.Salakhutdinov.Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 2014. [32] C. Strapparava and R. Mihalcea. Semeval-2007 task 14: Affective text. In Proceedings of the 4th international workshop on semantic evaluations. Association for Computational Linguistics, 2007. [33] N. Yang, R. Muraleedharan, J. Kohl, I. Demirkol, W. Heinzelman, and M. Sturge- Apple. Speech-based emotion classification using multiclass svm with hybrid kernel and thresholding fusion. In Spoken Language Technology Workshop (SLT), 2012 IEEE. IEEE, 2012. [34] A. Yessenalina, Y. Yue, and C. Cardie. Multi-level structured models for document-level sentiment classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010. [35] M.D. Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012. [36] Y.ZhangandB.Wallace.Asensitivityanalysisof(and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820, 2015. [37] H. Zhou, M. Huang, T.Y. Zhang, X.Y. Zhu, and B. Liu. Emotional chatting machine: Emotional conversation generation with internal and external memory. arXiv preprint arXiv:1704.01074, 2017.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文