|
[1] D. Bone, M. S. Goodwin, M. P. Black, C. C. Lee, K. Audhkhasi, and S. Narayanan, “Applying machine learning to facilitate autism diagnostics: Pitfalls and promises,” Journal of autism and developmental disorders, vol. 45, no. 5, pp. 1121-1136, 2015. [2] Shan-Wen Hsiao, Hung-Ching Sun, Ming-Chuan Hsieh, Ming-Hsueh Tsai, Hsin-Chih Lin, Chi-Chun Lee, “A multimodal approach for automatic assessment of school principals' oral presentation during pre-service training program,” in Proceedings of the International Speech Communication Association (Interspeech), 2015. [3] M. P. Black, et al., “Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features,” Speech Communication, vol. 55, no. 1, pp. 1-21, 2013. [4] Fu-Sheng Tsai, Ya-Ling Hsu, Wei-Chen Chen, Yi-Ming Weng, Chip-Jin Ng,Chi-Chun Lee, “Toward development and evaluation of pain level-rating scale for emergency triage based on vocal characteristics and facial expressions,” in Proceedings of the International Speech Communication Association (Interspeech), 2016. [5] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, “A survey of affect recognition methods: Audio, visual, and spontaneous expressions,” IEEE transactions on pattern analysis and machine intelligence, vol. 31, no. 1,pp. 39-58, 2009. [6] Y. Kim, H. Lee, and E. M. Provost, “Deep learning for robust feature generation in audiovisual emotion recognition,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3687-3691, 2013. [7] C. E. Osgood, G. J. Suci, and P. H. Tannenbaum, The measurement of meaning. Urbana, IL: University of Illinois Press, 1957. [8] S. Ebrahimi Kahou, V. Michalski, K. Konda, R. Memisevic, and C. Pal, “Recurrent neural networks for emotion recognition in video,” Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467-474, 2015. [9] K. S. Tai, R. Socher, and C. D. Manning, “Improved semantic representations from tree-structured long short-term memory networks,” Association for Computational Linguistics (ACL), 2015. [10] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks.” Advances in neural information processing systems, pp. 3104-3112, 2014. [11] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014. [12] O. Vinyals, and Q. Le, “A neural conversational model,” ICML Deep Learning Workshop, 2015. [13] A. Metallinou, Z. Yang, C. C. Lee, C. Busso, S.Carnicke, and S. Narayanan, “The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations,” Language resources and evaluation, vol. 50, no. 3, pp. 497-521, 2016. [14] R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, and M. Schröder, “'FEELTRACE': An instrument for recording perceived emotion in real time,” ISCA tutorial and research workshop (ITRW) on speech and emotion, 2000. [15] N. Ambady, and R. Rosenthal, “Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis,” Psychological bulletin, vol. 111, no. 2, pp. 256-274, 1992. [16] N. Ambady, and R. Rosenthal, “Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness,” Journal of personality and social psychology, vol. 64, no. 3, pp. 431-441, 1993. [17] J. Harrigan, and R. Rosenthal, New handbook of methods in nonverbal behavior research. Oxford University Press, 2008. [18] W. C. Lin, and C. C. Lee, “A thin-slice perception of emotion? An information theoretic-based framework to identify locally emotion-rich behavior segments for global affect recognition,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5790-5794, 2016. [19] P. Boersma et al., “Praat, a system for doing phonetics by computer,” Glot international, vol. 5, no. 9/10, pp. 341–345, 2002. [20] B. McFee et al., “librosa: Audio and music signal analysis in python,” Proceedings of the 14th Python in Science Conference. 2015. [21] M. Schuster, and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673-2681, 1997. [22] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997. [23] R. J. Williams, and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural computation, vol. 1, no. 2, pp. 270-280. 1989. [24] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014. [25] T. Tieleman, and G. Hinton, Lecture 6.5-rmsprop, COURSERA: Neural Networks for Machine Learning, 2012. [26] A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks. Springer Berlin Heidelberg, 2012. [27] A. Metallinou, A. Katsamanis, and S. Narayanan, “Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information,” Image and Vision Computing, vol. 31, no. 2, pp. 137-152, 2013. [28] A. Kleinsmith, N. Bianchi-Berthouze, and A. Steed, “Automatic recognition of non-acted affective postures,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 41, no. 4, pp. 1027-1038, 2011.
[29] N. Sebe, I. Cohen, T. Gevers, and T. S. Huang, “Emotion recognition based on joint visual and audio cues,” 18th International Conference on Pattern Recognition (ICPR), vol. 1, pp. 1136-1139, 2006. [30] Fabien Ringeval, et al, “Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data,” Pattern Recognition Letters, vol. 66, pp. 22-30, 2015. [31] H. Gunes, M. Pantic, “Automatic, dimensional and continuous emotion recognition,” International Journal of Synthetic Emotions, vol. 1, no.1, 2010. [32] N. Malandrakis, A. Potamianos, G. Evangelopoulos, and A. Zlatintsi, “A supervised approach to movie emotion tracking,” IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2376-2379, 2011. [33] A. Metallinou, A. Katsamanis, Y. Wang, and S. Narayanan, “Tracking changes in continuous emotion states using body language and prosodic cues,” IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2288-2291, 2011. [34] A. Hanjalic, and L. Q. Xu, “Affective video content representation and modeling,” IEEE transactions on multimedia, vol. 7, no. 1, pp. 143-154, 2005. [35] F. Eyben, G. L. Salomão, J. Sundberg, K. R. Scherer, and B. W. Schuller, “Emotion in the singing voice—a deeperlook at acoustic features in the light of automatic classification,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2015, no., pp. 1-9, 2015. [36] F. Eyben, M. Wöllmer, and B. Schuller, “Opensmile: the munich versatile and fast open-source audio feature extractor,” Proceedings of the 18th ACM international conference on Multimedia, pp. 1459-1462, 2010. [37] C. H. Wu, Z. J. Chuang, and Y. C. Lin, “Emotion recognition from text using semantic labels and separable mixture models,” ACM transactions on Asian language information processing (TALIP), vol. 5, no. 2, pp. 165-183, 2006. [38] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain,” Psychological review, vol. 65, no. 6, pp. 386-408, 1958. [39] D. E. Rumelhart, G. E. Hinton and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp.533-536, 1986. [40] G. E. Hinton, and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504-507, 2006. [41] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157-166, 1994. [42] D. Ververidis, & C. Kotropoulos, “Emotional speech recognition: Resources, features, and methods. Speech communication,” vol. 48, no. 9, pp. 1162-1181, 2006. [43] M. A. Nicolaou, H. Gunes, and M. Pantic, “Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space,” IEEE Transactions on Affective Computing, vol. 2, no. 2, pp. 92-105, 2011. [44] M. Wöllmer, M. Kaiser, F. Eyben, B. Schuller, and G. Rigoll, “LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework,” Image and Vision Computing, vol. 31, no. 2, pp. 153-163, 2013. [45] M. Wöllmer, F. Weninger, T. Knaup, B. Schuller, C. Sun, K. Sagae, and L. P. Morency, “Youtube movie reviews: Sentiment analysis in an audio-visual context,” IEEE Intelligent Systems, vol. 28, no. 3, pp. 46-53, 2013.
|