帳號:guest(3.17.186.188)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳亘宇
作者(外文):Chen, Hsuan Yu
論文名稱(中文):透過結合fMRI大腦血氧濃度相依訊號以改善語音情緒辨識系統
論文名稱(外文):Improving Categorical Emotion Recognition by Fusing Audio Features with Generated fMRI Brain Responses
指導教授(中文):李祈均
指導教授(外文):Lee, Chi Chun
口試委員(中文):郭立威
劉弈汶
曹昱
口試委員(外文):Kuo, Li Wei
Liu, Yi Wen
Tsao, Yu
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:103061611
出版年(民國):105
畢業學年度:105
語文別:中文
論文頁數:36
中文關鍵詞:人類行為訊號處裡情緒辨識情緒正負向功能性磁振造影高斯混合回歸模型
外文關鍵詞:behavioral signal processing(BSP)emotion recognitionvalencefMRIGaussian Mixture Regression
相關次數:
  • 推薦推薦:0
  • 點閱點閱:148
  • 評分評分:*****
  • 下載下載:20
  • 收藏收藏:0
了解人類神經感知系統對於語音中情緒解碼的運作機制是一個重要的研究方向。然而,語音在激動程度(Activation/Arousal)上,比較好表現出來,如 : 講話高昂、說話速度快慢,即使缺乏語意下,透過聲學基本特徵仍然可以辨識出語音的激動程度。但是把相同的研究架構討論於語音情緒正負向(valence)的分析就相對困難,判斷語音的正負向通常需要語意,如果語意不在,光從語音角度就很難用特徵來表示。此論文中,我們希望透過fMRI角度來探討大腦血氧濃度相依訊號(Blood Oxygenation Level Dependent, BOLD)是否能幫助提升透過語音情緒辨識系統;尤其對於正負向的差異(受試者在沒有語意的語音刺激下)。本論文期望可以用fMRI訊號特徵來強化表達語音訊號的正負向特性,並探討導致這些情感認知的確切特性。
然而,進行fMRI研究需要花費龐大的經費與時間成本來做資料的收集,為了滿足實驗的要求,往往要找到合適且配合的受試者十分不易。本論也透過建立一個統計生成模型:高斯混合回歸(GMR),描述語音特徵與大腦fMRI特徵的結合關係,藉由此模型我們可以在缺乏fMRI資料情況下,模擬出fMRI特徵。我們透過一系列的實驗驗證,發現利用GMR模擬出來的fMRI特徵也可以幫助語音特徵在情緒辨識上得到良好的提升。
Understanding the underlying neuro-perceptual mechanism of humans’ ability to decode emotional content in vocal signal is an important research direction. However, it is well know that obtaining valence from speech features is much difficult than arousal. Arousal can be accurately identified, also automatically-recognized, using speech signal without context. On the other hand, it is much more difficult to recognize valence if speech does not contain context. In this paper, we obtain the fMRI-derived features from blood oxygen level-dependent (BOLD) signals when subjects are exposed to various vocal emotion stimuli. We observe that by using the fMRI-derived feature to predict valence is beneficial to speech-based emotion recognition system. Furthermore, due to the fact that fMRI scanning is costly and time-consuming, we integrate audio features and fMRI-derived features to learn a joint representation by using Gaussian mixture regression (GMR). Finally, the proposed framework demonstrates that we are capable of obtaining an improved categorical emotion recognition using audio features fused with the stimulated vocal-induced fMRI-derived features, which generated from the GMR model.
口試委員會審定書 #
誌謝 i
中文摘要 iii
ABSTRACT iv
目錄 v
圖目錄 vii
表目錄 viii
Chapter 1 緒論 1
1.1 研究動機與目的 1
1.2 研究簡介 2
1.3 論文架構 3
Chapter 2 研究方法 4
2.1 語音情緒刺激設計 4
2.2 fMRI數據採集 7
2.2.1 fMRI簡介 7
2.2.2 fMRI資料收集 9
2.2.3 fMRI資料前處理 9
2.3 機器學習模型 11
2.3.1 支持向量機(Support Vector Machine) 11
2.3.2 高斯混合回歸(Gaussian Mixture Regression) 12
Chapter 3 實驗架構與結果 16
3.1 實驗前置作業 17
3.1.1 語音特徵擷取 17
3.1.2 fMRI特徵擷取 19
3.1.3 機器學習建模 20
3.2 實驗一之結果與討論 22
3.3 實驗二之結果與討論 26
3.4 實驗三之結果與討論 29
Chapter 4 結論與未來發展 32
4.1 結論 32
4.2 未來發展 32
參考文獻 34
[1] Buchanan, T. W., Lutz, K., Mirzazade, S., Specht, K., Shah, N. J., Zilles, K., & Jäncke, L. (2000). Recognition of emotional prosody and verbal components of spoken language: an fMRI study. Cognitive Brain Research, 9(3), 227-238.
[2] Vuilleumier, P., Armony, J. L., Driver, J., & Dolan, R. J. (2001). Effects of attention and emotion on face processing in the human brain: an event-related fMRI study. Neuron, 30(3), 829-841.
[3] Sander, D., Grandjean, D., Pourtois, G., Schwartz, S., Seghier, M. L., Scherer, K. R., & Vuilleumier, P. (2005). Emotion and attention interactions in social cognition: brain regions involved in processing anger prosody. Neuroimage,28(4), 848-858.
[4] Grandjean, D., Sander, D., Pourtois, G., Schwartz, S., Seghier, M. L., Scherer, K. R., & Vuilleumier, P. (2005). The voices of wrath: brain responses to angry prosody in meaningless speech. Nature neuroscience, 8(2), 145-146.
[5] Olson, I. R., Plotzker, A., & Ezzyat, Y. (2007). The enigmatic temporal pole: a review of findings on social and emotional processing. Brain, 130(7), 1718-1731.
[6] Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE transactions on speech and audio processing, 13(2), 293-303.
[7] Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence, 31(1), 39-58.
[8] Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech communication, 48(9), 1162-1181.
[9] Calvo, R. A., & D'Mello, S. (2010). Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on affective computing, 1(1), 18-37.
[10] Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., ... & Narayanan, S. S. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42(4), 335-359.
[11] Chen, H. Y., Liao, Y. H., Jan, H. T., Kuo, L. W., & Lee, C. C. (2016, March). A Gaussian mixture regression approach toward modeling the affective dynamics between acoustically-derived vocal arousal score (VC-AS) and internal brain fMRI bold signal response. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5775-5779). IEEE.
[12] Chao-Gan, Y., & Yu-Feng, Z. (2010). DPARSF: a MATLAB toolbox for “pipeline” data analysis of resting-state fMRI. Frontiers in systems neuroscience, 4.
[13] Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Muller, A., Kossaifi, J., ... & Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-learn. arXiv preprint arXiv:1412.3919.
[14] Mourão-Miranda, J., Bokde, A. L., Born, C., Hampel, H., & Stetter, M. (2005). Classifying brain states and determining the discriminating activation patterns: support vector machine on functional MRI data. NeuroImage, 28(4), 980-995.
[15] Härdle, W. K., Prastyo, D. D., & Hafner, C. (2012). Support Vector Machines with Evolutionary Feature Selection for Default Prediction.
[16] Calinon, S., Guenter, F., & Billard, A. (2007). On learning, representing, and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 37(2), 286-298.
[17] Metallinou, A., Katsamanis, A., & Narayanan, S. (2013). Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information. Image and Vision Computing, 31(2), 137-152.
[18] Boersma, P. (2002). Praat, a system for doing phonetics by computer. Glot international, 5(9/10), 341-345.
[19] Perronnin, F., & Dance, C. (2007, June). Fisher kernels on visual vocabularies for image categorization. In 2007 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-8). IEEE.
[20] Peng, X., Zou, C., Qiao, Y., & Peng, Q. (2014, September). Action recognition with stacked fisher vectors. In European Conference on Computer Vision (pp. 581-595). Springer International Publishing.
[21] Sun, C., & Nevatia, R. (2013, January). Large-scale web video event classification by use of fisher vectors. In Applications of Computer Vision (WACV), 2013 IEEE Workshop on (pp. 15-22). IEEE.
[22] Peng, X., Wang, L., Wang, X., & Qiao, Y. (2016). Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice.Computer Vision and Image Understanding.
[23] Calhoun, V. D., Liu, J., & Adalı, T. (2009). A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data. Neuroimage,45(1), S163-S172.
[24] Srivastava, N., & Salakhutdinov, R. R. (2012). Multimodal learning with deep boltzmann machines. In Advances in neural information processing systems (pp. 2222-2230).
[25] Han, J., Ji, X., Hu, X., Guo, L., & Liu, T. (2015). Arousal recognition using audio-visual features and fmri-based brain response. IEEE Transactions on Affective Computing, 6(4), 337-347.
[26] Jenke, R., Peer, A., & Buss, M. (2014). Feature extraction and selection for emotion recognition from EEG. IEEE Transactions on Affective Computing,5(3), 327-339.
[27] Anders, S., Eippert, F., Weiskopf, N., & Veit, R. (2008). The human amygdala is sensitive to the valence of pictures and sounds irrespective of arousal: an fMRI study. Social cognitive and affective neuroscience, 3(3), 233-243.
[28] Whalen, P. J., Rauch, S. L., Etcoff, N. L., McInerney, S. C., Lee, M. B., & Jenike, M. A. (1998). Masked presentations of emotional facial expressions modulate amygdala activity without explicit knowledge. The Journal of neuroscience, 18(1), 411-418.
[29] Formisano, E., De Martino, F., & Valente, G. (2008). Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning. Magnetic resonance imaging, 26(7), 921-934.
[30] Frühholz, S., Trost, W., & Grandjean, D. (2014). The role of the medial temporal limbic system in processing emotions in voice and music. Progress in neurobiology, 123, 1-17.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top

相關論文

1. 利用聯合因素分析研究大腦磁振神經影像之時間效應以改善情緒辨識系統
2. 結合fMRI之迴旋積類神經網路多層次特徵 用以改善語音情緒辨識系統
3. 透過語音特徵建構基於堆疊稀疏自編碼器演算法之婚姻治療中夫妻互動行為量表自動化評分系統
4. 基於健保資料預測中風之研究並以Hadoop作為一種快速擷取特徵工具
5. 一個利用人類Thin-Slice情緒感知特性所建構而成之全時情緒辨識模型新框架
6. 應用多任務與多模態融合技術於候用校長演講自動評分系統之建構
7. 基於多模態主動式學習法進行樣本與標記之間的關係分析於候用校長評鑑之自動化評分系統建置
8. 針對實體化交談介面開發基於行為衡量方法於自閉症小孩之評估系統
9. 一個多模態連續情緒辨識系統與其應用於全域情感辨識之研究
10. 整合文本多層次表達與嵌入演講屬性之表徵學習於強健候用校長演講自動化評分系統
11. 利用LSTM演算法基於自閉症診斷觀察量表訪談建置辨識自閉症小孩之評估系統
12. 利用多模態模型混合CNN和LSTM影音特徵以自動化偵測急診病患疼痛程度
13. 以雙向長短期記憶網路架構混和多時間粒度文字模態改善婚 姻治療自動化行為評分系統
14. 透過表演逐字稿之互動特徵以改善中文戲劇表演資料庫情緒辨識系統
15. 基於大腦靜息態迴旋積自編碼的fMRI特徵擷取器
 
* *