帳號:guest(18.117.78.87)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳柏軒
作者(外文):Chen, Po Hsuan
論文名稱(中文):透過語音特徵建構基於堆疊稀疏自編碼器演算法之婚姻治療中夫妻互動行為量表自動化評分系統
論文名稱(外文):Automating Behavior Coding for Distressed Couple Interactions Based on Stacked Sparse Autoencoder Framework using Speech-acoustic Features
指導教授(中文):李祈均
指導教授(外文):Lee, Chi Chun
口試委員(中文):劉奕汶
曹昱
李宏毅
口試委員(外文):Liu, Yi Wen
Tsao, Yu
Lee, Hung Yi
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:101061613
出版年(民國):104
畢業學年度:104
語文別:中文
論文頁數:33
中文關鍵詞:深度學習堆疊稀疏自編碼器婚姻治療人類行為分析情緒分析
外文關鍵詞:Deep LearningStacked AutoencodersCouple therapyHuman behavior analysisEmotion recognition
相關次數:
  • 推薦推薦:0
  • 點閱點閱:111
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
在過去人類行為分析是透過傳統人為觀察方式來記錄,再對記錄下的行為 進行分析和診斷,最後做行為的決策,獲取相關行為訊息,但是這樣分析的過 程非常耗時和成本,如果把這樣的過程進行自動化處理,將會節省非常多的時間,2013年被提出的人類行為訊號處理 (Behavioral Signal Processing, BSP),把記錄下來的資訊擷取相關特徵值,如動作訊號、語句特徵或是聲音特徵等等,藉由這些取出來特徵,再利用機器學習和分析結果進行分類,最後得知相關行 為的資訊,利用這樣自動化處理不僅節省時間,在未來結合傳統醫療也能做更前端的決策應用,現在應用中的領域有: 心理層面(FMRI, 情感計算)、精神醫療(自閉症治療, 婚姻治療) 和教育研究方面。像婚姻治療方面,評分者利用觀 看錄影的方式來對一整段夫妻對話中所展現的行為作評分。藉由這樣取得各種 行為表達程度的量化,針對此量化分數來更進一步研究夫妻婚姻治療成效,但這種做法非常耗時且會因為評分者的各種主觀因素影響最後的準確性。如果能透過機器學習的方式來自動化處理辨識,將會節省非常多的人工時間和提 升客觀性。深度學習 (Deep Learning) 在目前機器學習上是很熱門的話題。本論文提出以堆疊稀疏自編碼器 (Stacked Sparse Autoencoder,SSAE)方式對聲音訊號特徵進行降維,並找出相對關鍵的高階特徵,最後再利用邏輯迴歸分析(Logistic Regression,LR) 來辨識。此方法的整體準確率為75%(丈夫行為平均辨識準確率為74.9%、太太為75%)。相對於過去研究的74.1% (丈夫行為平均準確率75%,太太為73.2%)[1],提升 0.9% 。我們提出的方法在使用更低維度的聲音特徵值中可有效的提升行為辨識準確率。
Traditional way of conducting analyses of human behaviors is through manual observation. For example in couple therapy studies, human raters observe sessions of interaction between distressed couples and manually annotate the behaviors of each spouse using established coding manuals. Clinicians then analyze these annotated behaviors to understand the effectiveness of treatment that each couple receives. However, this manual observation approach is very time consuming, and the subjective nature of the annotation process can result in unreliable annotation. Our work aims at using machine learning approach to automate this process, and by using signal processing technique, we can bring in quantitative evidence of human behavior. Deep learning is the current state-of-art machine learning technique. This paper proposes to use stacked sparse autoencoder(SSAE) to reduce the dimensionality of the acoustic-prosodic features used in order to identify the key higher-level features. Finally, we use logistic regression (LR) to perform classification on recognition of high and low rating of six different codes. The method achieves an overall accuracy of 75% over 6 codes (husband’s average accuracy of 74.9%, wife’s average accuracy of 75%), compared to the previously-published study of 74.1% (husband’s average accuracy of 75%, wife’s average accuracy of 73.2%) [1], a total improvement of 0.9%. Our proposed method achieves a higher classification rate by using much fewer number of features (10 times less than the previous work )
誌謝 i
中文摘要 ii
Abstract iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章 緒論 1
1.1 研究簡介 1
1.2 研究動機與目的 5
1.3 相關研究 5
1.4 論文架構 5
第二章 夫妻婚姻治療資料庫 6
第三章 研究方法 9
3.1 問題之定義 9
3.2 類神經網路 10
3.3 深度學習 11
3.3.1 自編碼器 13
3.3.2 堆疊稀疏自編碼器 15
3.4 其它研究方法 18
3.4.1 K-平均演算法 18
3.4.2 主成分分析法 19
第四章 實驗與分析 20
4.1 特徵值 20
4.2 資料 22
4.3 系統架構 23
4.4 實驗數據與參數分析 27
4.4.1 實驗設定 27
4.4.2 實驗結果 31
第五章 結論和未來展望 32
5.1 結論 32
5.2 未來展望 32
參考文獻 33
[1] Matthew P Black, Athanasios Katsamanis, Brian R Baucom, Chi-Chun Lee, Adam C Lammert, Andrew Christensen, Panayiotis G Georgiou, and Shrikanth S Narayanan. Toward automating a human behavioral coding sys- tem for married couples’interactions using speech acoustic feature. Speech Communication, 55(1):1–21, 2013.
[2] M O’Brien, R S John, G Margolin, and O Erel. Reliability and diagnostic ef- ficacy of parents’ reports regarding children’s exposure to marital aggression. Violence and victims, 9(1):45–62, 1994.
[3] Gian C Gonzaga, Belinda Campos, and Thomas Bradbury. Similarity, con- vergence, and relationship satisfaction in dating and married couples. Journal of personality and social psychology, 93(1):34–48, 2007.
[4] Gian C Gonzaga, Belinda Campos, and Thomas Bradbury. Similarity, con- vergence, and relationship satisfaction in dating and married couples. Journal of personality and social psychology, 93(1):34–48, 2007.
[5] Björn Schuller, Anton Batliner, Dino Seppi, Stefan Steidl, Thurid Vogt, Jo- hannes Wagner, Laurence Devillers, Laurence Vidrascu, Noam Amir, Loic Kessous, and Vered Aharonson. The relevance of feature type for the au- tomatic classification of emotional user states: Low level descriptors and functionals. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, volume 2, pages 881– 884, 2007.
[6] Shrikanth Narayanan and Panayiotis G. Georgiou. Behavioral signal pro- cessing: Deriving human behavioral informatics from speech and language. Proceedings of the IEEE, 101(5):1203–1233, 2013.
[7] Alessandro Vinciarelli, Maja Pantic, and Hervé Bourlard. Social signal pro- cessing: Survey of an emerging domain. Image and Vision Computing, 27(12): 1743–1759, 2009.
[8] Björn Schuller, Stefan Steidl, and Anton Batliner. The INTERSPEECH 2009 emotion challenge. In Proceedings of the Annual Conference of the Interna- tional Speech Communication Association, INTERSPEECH, pages 312–315, 2009.
[9] Video Game. Essential Facts About the Computer and Video Game. Com- puter, 2009:16, 2010.
[10] Dan Jurafsky, R Ranganath, and D McFarland. Extracting social mean- ing: Identifying interactional style in spoken conversation. In NAACL ’09 Proceedings of Human Language Technologies, number June, pages 638–646, 2009.
[11] Angeliki Metallinou, Martin Wöllmer, Athanasios Katsamanis, Florian Ey- ben, Björn Schuller, and Shrikanth Narayanan. Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Transactions on Affective Computing, 3(2):184–198, 2012.
[12] Yelin Kim, Honglak Lee, and Emily Mower Provost. Deep learning for robust feature generation in audiovisual emotion recognition. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Pro- ceedings, pages 3687–3691, 2013.
[13] Stephan Steidl, Fathima Razik, and Adam K Anderson. Emotion enhanced retention of cognitive skill learning. Emotion (Washington, D.C.), 11(1):12– 19, 2011.
[14] Kahni Clements and Amy Holtzworth-Munroe. Aggressive cognitions of vio- lent versus nonviolent spouses. Cognitive Therapy and Research, 32(3):351– 369, 2008.
[15] Catherine Rice. Prevalence of Autism Spectrum Disorders — Autism and De- velopmental Disabilities Monitoring Network, United States, 2006. MMRW, National Center on Birth Defects and Developmental Disabilities, CDC, 58:1– 20, 2009.
[16] Apa. Diagnostic and statistical manual of mental disorders. Text, 4th(1):943, 2000.
[17] Maxine Eskenazi. An overview of spoken language technology for education.
Speech Communication, 51(10):832–844, 2009.
[18] John A Ross. The Reliability, Validity, and Utility of Self-Assessment. Prac- tical Assessment, Research & Evaluation, Nov, 11 (1:pp.13, 2006.
[19] Andrew Christensen, David C Atkins, Jean Yi, Donald H Baucom, and William H George. Couple and individual adjustment for 2 years following a randomized clinical trial comparing traditional versus integrative behavioral couple therapy. Journal of consulting and clinical psychology, 74(6):1180– 1191, 2006.
[20] Paul Boersma and David Weenink. Praat, a system for doing phonetics by computer. 2001.
[21] Florian Eyben. openSMILE – The Munich Versatile and Fast Open-Source Audio Feature Extractor Categories and Subject Descriptors. Delta, pages 1459–1462, 2010.
[22] G E Hinton and R R Salakhutdinov. Reducing the dimensionality of data with neural networks. Science (New York, N.Y.), 313(5786):504–507, 2006.
[23] Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel Rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012.
[24] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet Clas- sification with Deep Convolutional Neural Networks. Advances In Neural Information Processing Systems, pages 1–9, 2012.
[25] Tao Wang, DJ J Wu, A Coates, and AY Y Ng. End-to-end text recognition with convolutional neural networks. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 3304–3308, 2012.
[26] D. Ververidis, C. Kotropoulos, and I. Pitas. Automatic emotional speech classification. 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 2004.
[27] Dan-Ning Jiang Dan-Ning Jiang and Lian-Hong Cai Lian-Hong Cai. Speech emotion classification with the combination of statistic features and tempo- ral features. 2004 IEEE International Conference on Multimedia and Expo
(ICME) (IEEE Cat. No.04TH8763), 3, 2004.
[28] Xuan Hung Le Xuan Hung Le, G. Quenot, and E. Castelli. Recognizing emotions for the audio-visual document indexing. Proceedings. ISCC 2004.
[29] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech Recog- nition With Deep Recurrent Neural Networks. In ICASSP, number 3, pages 6645–6649, 2013.
[30] Richard Socher, Brody Huval, Christopher D Manning, and Andrew Y Ng. Semantic Compositionality through Recursive Matrix-Vector Spaces. In Pro- ceedings of the 2012 Joint Conference on Empirical Methods in Natural Lan- guage Processing and Computational Natural Language Learning, pages 1201– 1211, 2012.
[31] Yelin Kim, Honglak Lee, and Emily Mower Provost. Deep learning for robust feature generation in audiovisual emotion recognition. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Pro- ceedings, pages 3687–3691, 2013.
[32] Angeliki Metallinou, Martin Wöllmer, Athanasios Katsamanis, Florian Ey- ben, Björn Schuller, and Shrikanth Narayanan. Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Transactions on Affective Computing, 3(2):184–198, 2012.
[33] N S Jacobson, A Christensen, S E Prince, J Cordova, and K Eldridge. Inte- grative behavioral couple therapy: an acceptance-based, promising new treat- ment for couple discord. Technical Report 2, 2000.
[34] J Jones and A Christensen. Couples interaction study: Social support inter- action rating system. University of California, Los Angeles, 7, 1998.
[35] C Heavey, D Gill, and A Christensen. Couples interaction rating system 2 (cirs2). University of California, Los Angeles, 7, 2002.
[36] Yoshua Bengio. Learning Deep Architectures for AI, 2009.
[37] K Fukushima. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4):193–202, 1980.
[38] P J Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, 1974.[39] Nickolai S. Rubanov. The layer-wise method and the backpropagation hybrid approach to learning a feedforward neural network. IEEE Transactions on Neural Networks, 11(2):295–305, 2000.
[40] Galen Andrew and Jianfeng Gao. Scalable training of L 1 -regularized log- linear models. In International Conference on Machine Learning, volume 24, pages 33–40, 2007.
[41] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learn- ing: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013.
[42] J B MacQueen. Kmeans Some Methods for classification and Analysis of Mul- tivariate Observations. 5th Berkeley Symposium on Mathematical Statistics and Probability 1967, 1:281–297, 1967.
[43] Adam Coates, Ann Arbor, and Andrew Y Ng. An Analysis of Single-Layer Networks in Unsupervised Feature Learning. Aistats 2011, pages 215–223, 2011.
[44] Svante Wold, Kim Esbensen, and Paul Geladi. Principal component analysis, 1987.
[45] Matthew Black, Athanasios Katsamanis, Chi-chun Lee, Adam C Lammert, Brian R Baucom, Andrew Christensen, Panayiotis G Georgiou, and Shrikanth Narayanan. Automatic Classification of Married Couples ’Behavior using
Audio Features. Corpus, (September):2030–2033, 2010.
[46] Ron Kohavi. A Study of Cross-Validation and Bootstrap for Accuracy Esti- mation and Model Selection. In International Joint Conference on Artificial Intelligence, volume 14, pages 1137–1143, 1995.



 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *