帳號:guest(13.58.59.133)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):黃文俞
作者(外文):Huang, Wen-Yu
論文名稱(中文):整合文本多層次表達與嵌入演講屬性之表徵學習於強健候用校長演講自動化評分系統
論文名稱(外文):Enhancement of Automatic Assessment System for Pre-service Principals’ Oral Presentation using Speech Attribute-enriched Multi-level Feature
指導教授(中文):李祈均
指導教授(外文):Lee, Chi-Chun
口試委員(中文):曹昱
陳宜欣
陳縕儂
口試委員(外文):Tsao, Yu
Chen, Yi-Shin
Chen, Yun-Nung
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:104061605
出版年(民國):106
畢業學年度:105
語文別:中文
論文頁數:46
中文關鍵詞:行為訊號處理教育研究口頭演講自然語言處理演講屬性主題模型
外文關鍵詞:behavior signal processingeducational researchoral presentationnatural language processingself-defined attribute tagtopic model
相關次數:
  • 推薦推薦:0
  • 點閱點閱:382
  • 評分評分:*****
  • 下載下載:26
  • 收藏收藏:0
為了達到大規模專業評量施測的目的,結合領域知識的工程運算框架需求與日俱增,其中,自動化口語評量系統對於教育領域是極其重要的研究方向。本論文的基礎即是針對國民中小學校長儲訓計畫,與國家教育研究院人員共同合作開發之口頭即席演講自動化評分系統,而此研究旨在透過擴增與精進文字模態資訊以達成強化音視訊多模態系統成效之目的。為此,實驗中整合分布式表示法與人工標記類別形成「多層次概念」文字特徵,得以有效表達演講逐字稿之資訊。進而,我們提出「演講屬性標註」概念欲增進文字模態成果,並以從機器學習的標籤和特徵,兩種面向探討其可行性與有效性;實驗一將標註當作文章額外標籤,應用多標籤學習架構為文字模態加入演講屬性資訊,實驗二以屬性標註作為基礎延伸成為嵌入「演講屬性」的文章特徵表示法,期盼兩種方法可以整合標註資訊以增強文字特徵效果。由實驗結果得知,「多層次文字特徵」將文字模態的建模效果由0.378提升至最高0.493的斯皮爾曼相關係數,而嵌入「演講屬性」資訊後可達到0.574,最後於多模態平分系統加入文字可由0.469上升到0.621。因此,本論文研究中,應用「多層次概念」結合不同時間粒度和語詞多面向資訊來形成文字特徵,並提出可計算「演講屬性標註」資訊的運算架構,用以強化候用校長演講自動化評分系統的建置效果。
Among increasing needs of domain-aware computational models that can perform large-scale assessment like domain experts, the development of automatic oral presentation assessment system is important for education researchers. In this work, we extend the previous audiovisual framework on pre-service school principals’ 3-minute long impromptu speech using lexical information as additional modality. We aim at exploring effective feature set for text and enhancing the performance of lexical modality by manual tagging information. First, we utilize multi-level feature extraction approach, which consists of distributed representations and word categories, to derive features from the transcripts in the 2014 National Academy for Educational Research (NAER) oral presentation database, and improve the result of lexical modality from Spearman correlation of 0.378 to 0.493. Furthermore, inspired by folksonomy, we propose to enhance lexical feature by using a self-defined attribute tags of speech transcripts. Therefore, we carry out two different experiments: Exp I) considering the tags as other labels and employing multi-label learning, and Exp II) feature inspired by tags and topic modeling. After incorporating the two methods, the improved system obtains Spearman correlation of 0.574. Our experiment demonstrates the concept of self-defined attribute tags has capability to enrich lexical modality and improve system.
目錄
授權書 ii
論文指導教授推薦函 iii
學位考試委員會審定書 iv
誌謝 v
中文摘要 vi
Abstract vii
目錄 viii
表目錄 x
圖目錄 xi
第一章 緒論 1
1.1 前言 1
1.2 研究動機 2
1.3 論文架構 3
第二章 資料庫 5
2.1 候用校長即席演講語料庫 5
2.2 正規化評分與演講屬性標註 6
2.2.1 排序標籤正規化 6
2.2.2 演講屬性標註 8
第三章 研究方法 9
3.1 經典向量空間模型 9
3.2 分布式表示法 10
3.2.1 詞向量 11
3.2.2 文章向量 14
3.3 多標籤學習 15
3.3.1 聯合特徵學習 17
3.3.2 格拉姆矩陣組合 17
3.4 主題模型 18
3.4.1 隱含狄利克雷分布 18
3.4.2 潛在主題向量化 20
第四章 實驗設計、結果與分析 21
4.1 前置實驗:多層次文章特徵 21
4.1.1 實驗概念與設計 21
4.1.2 結果與討論 22
4.2 實驗壹:標籤式演講屬性標註資訊 25
4.2.1 實驗概念與設計 25
4.2.2 結果與討論 26
4.3 實驗貳:特徵式演講屬性標註資訊 31
4.3.1 實驗概念與設計 31
4.3.2 結果與探討 32
4.4  綜合式屬性標註資訊之結果 34
第五章 結論 35
參考文獻 37
附錄 45
參考文獻
[1] S. Narayanan and P. G. Georgiou, "Behavioral signal processing: Deriving human behavioral informatics from speech and language," Proceedings of the IEEE, vol. 101, no. 5, pp. 1203-1233, 2013.
[2] A. Tsanas, M. A. Little, P. E. McSharry, J. Spielman, and L. O. Ramig, "Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease," IEEE Transactions on Biomedical Engineering, vol. 59, no. 5, pp. 1264-1271, 2012.
[3] X. Zhu, H.-I. Suk, and D. Shen, "A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis," NeuroImage, vol. 100, pp. 91-105, 2014.
[4] J. Gibson et al., "A Deep Learning Approach to Modeling Empathy in Addiction Counseling," Commitment, vol. 111, p. 21, 2016.
[5] B. Xiao, C. Huang, Z. E. Imel, D. C. Atkins, P. Georgiou, and S. S. Narayanan, "A technology prototype system for rating therapist empathy from audio recordings in addiction counseling," PeerJ Computer Science, vol. 2, p. e59, 2016.
[6] B. Xiao, Z. E. Imel, P. G. Georgiou, D. C. Atkins, and S. S. Narayanan, "" Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing," PloS one, vol. 10, no. 12, p. e0143055, 2015.
[7] P. M. Faye et al., "Newborn infant pain assessment using heart rate variability analysis," The Clinical journal of pain, vol. 26, no. 9, pp. 777-782, 2010.
[8] F.-S. Tsai, Y.-L. Hsu, W.-C. Chen, Y.-M. Weng, C.-J. Ng, and C.-C. Lee, "Toward Development and Evaluation of Pain Level-Rating Scale for Emergency Triage based on Vocal Characteristics and Facial Expressions," Interspeech 2016, pp. 92-96, 2016.
[9] M. P. Black et al., "Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features," Speech Communication, vol. 55, no. 1, pp. 1-21, 2013.
[10] M. F. Jung, "Coupling Interactions and Performance: Predicting Team Performance from Thin Slices of Conflict," ACM Transactions on Computer-Human Interaction (TOCHI), vol. 23, no. 3, p. 18, 2016.
[11] C.-C. Lee et al., "Computing vocal entrainment: A signal-derived PCA-based quantification scheme with application to affect analysis in married couple interactions," Computer Speech & Language, vol. 28, no. 2, pp. 518-539, 2014.
[12] D. Bone, M. S. Goodwin, M. P. Black, C.-C. Lee, K. Audhkhasi, and S. Narayanan, "Applying machine learning to facilitate autism diagnostics: pitfalls and promises," Journal of autism and developmental disorders, vol. 45, no. 5, pp. 1121-1136, 2015.
[13] D. Bone et al., "The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody," Journal of Speech, Language, and Hearing Research, vol. 57, no. 4, pp. 1162-1177, 2014.
[14] D. Wall, J. Kosmicki, T. Deluca, E. Harstad, and V. Fusaro, "Use of machine learning to shorten observation-based screening and diagnosis of autism," Translational psychiatry, vol. 2, no. 4, p. e100, 2012.
[15] A. Metallinou, Z. Yang, C.-c. Lee, C. Busso, S. Carnicke, and S. Narayanan, "The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations," Language resources and evaluation, vol. 50, no. 3, pp. 497-521, 2016.
[16] Z. Yang, A. Metallinou, E. Erzin, and S. Narayanan, "Analysis of interaction attitudes using data-driven hand gesture phrases," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, 2014, pp. 699-703: IEEE.
[17] A. F. A. Khan, O. Mourad, A. M. K. B. Mannan, H. B. A. M. Dahan, and M. A. Abushariah, "Automatic Arabic pronunciation scoring for computer aided language learning," in Communications, Signal Processing, and their Applications (ICCSPA), 2013 1st International Conference on, 2013, pp. 1-6: IEEE.
[18] S. E. Petersen and M. Ostendorf, "A machine learning approach to reading level assessment," Computer speech & language, vol. 23, no. 1, pp. 89-106, 2009.
[19] S. M. Witt and S. J. Young, "Phone-level pronunciation scoring and assessment for interactive language learning," Speech communication, vol. 30, no. 2, pp. 95-108, 2000.
[20] S.-W. Hsiao, H.-C. Sun, M.-C. Hsieh, M.-H. Tsai, H.-C. Lin, and C.-C. Lee, "A Multimodal Approach for Automatic Assessment of School Principals' Oral Presentation During Pre-Service Training Program," in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
[21] W.-Y. Huang, S.-W. Hsiao, H.-C. Sun, M.-C. Hsieh, M.-H. Tsai, and C.-C. Lee, "Enhancement of Automatic Oral Presentation Assessment System Using Latent N-Grams Word Representation and Part-of-Speech Information," Interspeech 2016, pp. 1432-1436, 2016.
[22] Y. Cheong Cheng, Y.-Q. Mao, W. Yan, and L. Catherine Ehrich, "Principal preparation and training: a look at China and its issues," International Journal of Educational Management, vol. 23, no. 1, pp. 51-64, 2009.
[23] D. L. Keith, "Principal desirabilitiy for professional development," Academy of Educational Leadership Journal, vol. 15, no. 2, p. 95, 2011.
[24] P. S. Keung, "Continuing professional development of principals in Hong Kong," Frontiers of Education in China, vol. 2, no. 4, pp. 605-619, 2007.
[25] P. S. Salazar, "The professional development needs of rural high school principals: A seven-state study," The Rural Educator, vol. 28, no. 3, 2007.
[26] S. Watson, T. Miller, L. Johnston, and V. Rutledge, "Professional development school graduate performance: Perceptions of school principals," The Teacher Educator, vol. 42, no. 2, pp. 77-86, 2006.
[27] B. R. Baucom and E. Iturralde, "A behaviorist manifesto for the 21 st century," in Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific, 2012, pp. 1-4: IEEE.
[28] G. Margolin et al., "The nuts and bolts of behavioral observation of marital and family interaction," Clinical child and family psychology review, vol. 1, no. 4, pp. 195-213, 1998.
[29] J. Burstein, J. Tetreault, and N. Madnani, "The e-rater automated essay scoring system," Handbook of automated essay evaluation: Current applications and new directions, pp. 55-67, 2013.
[30] D. S. McNamara, S. A. Crossley, R. D. Roscoe, L. K. Allen, and J. Dai, "A hierarchical classification approach to automated essay scoring," Assessing Writing, vol. 23, pp. 35-59, 2015.
[31] L. Streeter, J. Bernstein, P. Foltz, and D. DeLand, "Pearson’s automated scoring of writing, speaking, and mathematics," ed: Pearson White Paper. Iowa City, IA: Pearson. Retrieved from http://www.pearsonassessments.com/hai/images/tmrs/PearsonsAutomatedScoringofWritingSpeakingandMathematics.pdf, 2011.
[32] M. Chatterjee, S. Park, L.-P. Morency, and S. Scherer, "Combining two perspectives on classifying multimodal data for recognizing speaker traits," in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, 2015, pp. 7-14: ACM.
[33] D. Higgins, X. Xi, K. Zechner, and D. Williamson, "A three-stage approach to the automated scoring of spontaneous spoken responses," Computer Speech & Language, vol. 25, no. 2, pp. 282-306, 2011.
[34] I. Naim, M. I. Tanveer, D. Gildea, and M. E. Hoque, "Automated prediction and analysis of job interview performance: The role of what you say and how you say it," in Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on, 2015, vol. 1, pp. 1-6: IEEE.
[35] I. Naim, M. I. Tanveer, D. Gildea, and E. Hoque, "Automated analysis and prediction of job interview performance," IEEE Transactions on Affective Computing, 2016.
[36] L. S. Nguyen, D. Frauendorfer, M. S. Mast, and D. Gatica-Perez, "Hire me: Computational Inference of Hirability in Employment Interviews Based on Nonverbal Behavior," IEEE Transactions on Multimedia, vol. 16, no. 4, pp. 1018-1031, 2014.
[37] L. S. Nguyen and D. Gatica-Perez, "I would hire you in a minute: Thin slices of nonverbal behavior in job interviews," in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, 2015, pp. 51-58: ACM.
[38] D. A. Silverstein and T. Zhang, "System and method of providing evaluation feedback to a speaker while giving a real-time oral presentation," ed: Google Patents, 2006.
[39] O. Kang, "Impact of rater characteristics and prosodic features of speaker accentedness on ratings of international teaching assistants' oral performance," Language Assessment Quarterly, vol. 9, no. 3, pp. 249-269, 2012.
[40] L. Chen, C. W. Leong, G. Feng, and C. M. Lee, "Using multimodal cues to analyze MLA'14 oral presentation quality corpus: Presentation delivery and slides quality," in Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge, 2014, pp. 45-52: ACM.
[41] M. Gentilucci and M. C. Corballis, "From manual gesture to speech: a gradual transition," Neuroscience & Biobehavioral Reviews, vol. 30, no. 7, pp. 949-960, 2006.
[42] D. McNeill, How language began: Gesture and speech in human evolution. Cambridge University Press, 2012.
[43] S. Scherer, G. Layher, J. Kane, H. Neumann, and N. Campbell, "An audiovisual political speech analysis incorporating eye-tracking and perception data," in LREC, 2012, pp. 1114-1120.
[44] A. Rosenberg and J. Hirschberg, "Acoustic/prosodic and lexical correlates of charismatic speech," in INTERSPEECH, 2005, pp. 513-516.
[45] M. Barthet, G. Fazekas, and M. Sandler, "Multidisciplinary perspectives on music emotion recognition: Implications for content and context-based models," Proc. CMMR, pp. 492-507, 2012.
[46] M. P. Black, P. G. Georgiou, A. Katsamanis, B. R. Baucom, and S. Narayanan, "“You made me do it”: Classification of Blame in Married Couples' Interactions by Fusing Automatically Derived Speech and Language Information," in Twelfth Annual Conference of the International Speech Communication Association, 2011.
[47] A. Kazemzadeh, S. Lee, and S. Narayanan, "Fuzzy logic models for the meaning of emotion words," IEEE Computational intelligence magazine, vol. 8, no. 2, pp. 34-49, 2013.
[48] H. D. Kim, C. Zhai, and J. Han, "Aggregation of multiple judgments for evaluating ordered lists," in European Conference on Information Retrieval, 2010, pp. 166-178: Springer.
[49] J. San Pedro and S. Siersdorfer, "Ranking and classifying attractiveness of photos in folksonomies," in Proceedings of the 18th international conference on World wide web, 2009, pp. 771-780: ACM.
[50] J. Tang, H.-f. Leung, Q. Luo, D. Chen, and J. Gong, "Towards Ontology Learning from Folksonomies," in IJCAI, 2009, vol. 9, pp. 2089-2094.
[51] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
[52] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, "A neural probabilistic language model," Journal of machine learning research, vol. 3, no. Feb, pp. 1137-1155, 2003.
[53] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in neural information processing systems, 2013, pp. 3111-3119.
[54] T. Mikolov, W.-t. Yih, and G. Zweig, "Linguistic Regularities in Continuous Space Word Representations," in Hlt-naacl, 2013, vol. 13, pp. 746-751.
[55] F. Morin and Y. Bengio, "Hierarchical Probabilistic Neural Network Language Model," in Aistats, 2005, vol. 5, pp. 246-252: Citeseer.
[56] R. Johnson and T. Zhang, "Effective use of word order for text categorization with convolutional neural networks," arXiv preprint arXiv:1412.1058, 2014.
[57] R. Johnson and T. Zhang, "Semi-supervised convolutional neural networks for text categorization via region embedding," in Advances in neural information processing systems, 2015, pp. 919-927.
[58] Y. Kim, "Convolutional neural networks for sentence classification," arXiv preprint arXiv:1408.5882, 2014.
[59] Q. V. Le and T. Mikolov, "Distributed Representations of Sentences and Documents," in ICML, 2014, vol. 14, pp. 1188-1196.
[60] Z. Zhang, P. Luo, C. C. Loy, and X. Tang, "Facial landmark detection by deep multi-task learning," in European Conference on Computer Vision, 2014, pp. 94-108: Springer.
[61] B. Jie, D. Zhang, B. Cheng, and D. Shen, "Manifold regularized multitask feature learning for multimodality disease classification," Human brain mapping, vol. 36, no. 2, pp. 489-507, 2015.
[62] Y. Luo, D. Tao, B. Geng, C. Xu, and S. J. Maybank, "Manifold regularized multitask learning for semi-supervised multilabel image classification," IEEE Transactions on Image Processing, vol. 22, no. 2, pp. 523-536, 2013.
[63] M.-T. Luong, Q. V. Le, I. Sutskever, O. Vinyals, and L. Kaiser, "Multi-task sequence to sequence learning," arXiv preprint arXiv:1511.06114, 2015.
[64] A. Argyriou, T. Evgeniou, and M. Pontil, "Convex multi-task feature learning," Machine Learning, vol. 73, no. 3, pp. 243-272, 2008.
[65] J. Liu, S. Ji, and J. Ye, "Multi-task feature learning via efficient l 2, 1-norm minimization," in Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, 2009, pp. 339-348: AUAI Press.
[66] G. Obozinski, B. Taskar, and M. Jordan, "Multi-task feature selection," Statistics Department, UC Berkeley, Tech. Rep, vol. 2, 2006.
[67] I. Bíró, J. Szabó, and A. A. Benczúr, "Latent dirichlet allocation in web spam filtering," in Proceedings of the 4th international workshop on Adversarial information retrieval on the web, 2008, pp. 29-32: ACM.
[68] M. Lienou, H. Maitre, and M. Datcu, "Semantic annotation of satellite images using latent Dirichlet allocation," IEEE Geoscience and Remote Sensing Letters, vol. 7, no. 1, pp. 28-32, 2010.
[69] J. D. Mcauliffe and D. M. Blei, "Supervised topic models," in Advances in neural information processing systems, 2008, pp. 121-128.
[70] R. Das, M. Zaheer, and C. Dyer, "Gaussian LDA for Topic Models with Word Embeddings," in ACL (1), 2015, pp. 795-804.
[71] D. Q. Nguyen, R. Billingsley, L. Du, and M. Johnson, "Improving topic models with latent feature word representations," Transactions of the Association for Computational Linguistics, vol. 3, pp. 299-313, 2015.
[72] C. E. Moody, "Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec," arXiv preprint arXiv:1605.02019, 2016.
[73] L. Niu, X. Dai, J. Zhang, and J. Chen, "Topic2Vec: learning distributed representations of topics," in Asian Language Processing (IALP), 2015 International Conference on, 2015, pp. 193-196: IEEE.
[74] Y. Liu, Z. Liu, T.-S. Chua, and M. Sun, "Topical Word Embeddings," in AAAI, 2015, pp. 2418-2424.
[75] J. W. Pennebaker, C. K. Chung, M. Ireland, A. Gonzales, and R. J. Booth, "The Development and Psychometric Properties of LIWC2007."
[76] Y. R. Tausczik and J. W. Pennebaker, "The psychological meaning of words: LIWC and computerized text analysis methods," Journal of language and social psychology, vol. 29, no. 1, pp. 24-54, 2010.
[77] M. del Pilar Salas-Zárate, E. López-López, R. Valencia-García, N. Aussenac-Gilles, Á. Almela, and G. Alor-Hernández, "A study on LIWC categories for opinion mining in Spanish reviews," Journal of Information Science, vol. 40, no. 6, pp. 749-760, 2014.
[78] C.-L. Huang et al., "The development of the Chinese linguistic inquiry and word count dictionary," Chinese Journal of Psychology, vol. 54, no. 2, pp. 185-201, 2012.
[79] H. Jegou, F. Perronnin, M. Douze, J. Sánchez, P. Perez, and C. Schmid, "Aggregating local image descriptors into compact codes," IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 9, pp. 1704-1716, 2012.
[80] F. Eyben, M. Wöllmer, and B. Schuller, "Opensmile: the munich versatile and fast open-source audio feature extractor," in Proceedings of the 18th ACM international conference on Multimedia, 2010, pp. 1459-1462: ACM.
[81] H. Wang and C. Schmid, "Action recognition with improved trajectories," in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 3551-3558.
[82] F. Perronnin, J. Sánchez, and T. Mensink, "Improving the fisher kernel for large-scale image classification," Computer Vision–ECCV 2010, pp. 143-156, 2010.
[83] Y. Sun, Y. Chen, X. Wang, and X. Tang, "Deep learning face representation by joint identification-verification," in Advances in neural information processing systems, 2014, pp. 1988-1996.
[84] Y. Kim, H. Lee, and E. M. Provost, "Deep learning for robust feature generation in audiovisual emotion recognition," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 2013, pp. 3687-3691: IEEE.
[85] H. Wang, H. Huang, and C. Ding, "Multi-label feature transform for image classifications," Computer Vision–ECCV 2010, pp. 793-806, 2010.
[86] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching word vectors with subword information," arXiv preprint arXiv:1607.04606, 2016.
[87] Y. Pan, T. Mei, T. Yao, H. Li, and Y. Rui, "Jointly modeling embedding and translation to bridge video and language," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4594-4602.
[88] K. Cho et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
[89] S. Arora, Y. Liang, and T. Ma, "A simple but tough-to-beat baseline for sentence embeddings," 2016.
[90] R. Kiros et al., "Skip-thought vectors," in Advances in neural information processing systems, 2015, pp. 3294-3302.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *