|
[1] C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan, “Iemocap: interactive emotional dyadic motion capture database,” Language Resources and Evaluation, vol. 42, pp. 335–359, Dec 2008. [2] S.R.LivingstoneandF.A.Russo,“Theryersonaudio-visualdatabaseofemo- tional speech and song (ravdess): A dynamic, multimodal set of facial and vo- cal expressions in north american english,” PLOS ONE, vol. 13, pp. 1–35, 05 2018. [3] K. Dupuis and M. Pichora-Fuller, “Recognition of emotional speech for younger and older talkers: Behavioural findings from the toronto emotional speech set,” Canadian Acoustics - Acoustique Canadienne, vol. 39, pp. 182– 183, 09 2011. [4] F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss, “A database of german emotional speech,” 9th European Conference on Speech Communication and Technology, vol. 5, pp. 1517–1520, 09 2005. [5] S.DemircanandH.Kahramanli,“Featureextractionfromspeechdataforemo- tion recognition,” Journal of Advances in Computer Networks, vol. 2, pp. 28– 30, 01 2014. [6] T. Iliou and C.-N. Anagnostopoulos, “Statistical evaluation of speech fea- tures for emotion recognition,” 2010 Fifth International Conference on Digital Telecommunications, vol. 0, pp. 121–126, 07 2009. [7] V. Dissanayake, H. Zhang, M. Billinghurst, and S. Nanayakkara, “Speech Emotion Recognition ‘in the Wild’Using an Autoencoder,” in Proc. Inter- speech 2020, pp. 526–530, 10 2020. [8] C. Yu, Q. Tian, F. Cheng, and S. Zhang, “Speech emotion recognition using support vector machines,” in Advanced Research on Computer Science and Information Engineering (G. Shen and X. Huang, eds.), pp. 215–220, 2011. [9] K. Han, D. Yu, and I. Tashev, “Speech emotion recognition using deep neural network and extreme learning machine,” Proceedings of the Annual Conference of the International Speech Communication Association, INTER- SPEECH, 09 2014. [10] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: Theory and applications,” Neurocomputing, vol. 70, no. 1, pp. 489–501, 2006. Neural Networks. [11] H. Zhang, R. Gou, J. Shang, F. Shen, Y. Wu, and G. Dai, “Pre-trained deep convolution neural network model with attention for speech emotion recogni- tion,” Frontiers in Physiology, vol. 12, 03 2021. [12] A. T. Beck, R. A. Steer, G. K. Brown, et al., Beck depression inventory. Har- court Brace Jovanovich New York:, 1987. [13] A. T. Beck, M. Kovacs, and A. Weissman, “Assessment of suicidal intention: The scale for suicide ideation.,” Journal of Consulting and Clinical Psychol- ogy, vol. 47, no. 2, p. 343–352, 1979. [14] R. C. Young, J. T. Biggs, V. E. Ziegler, and D. A. Meyer, “A rating scale for mania: Reliability, validity and sensitivity,” British Journal of Psychiatry, vol. 133, no. 5, p. 429–435, 1978. [15] J. H. Patton, M. S. Stanford, and E. S. Barratt, “Barratt impulsiveness scale- 11,” PsycTESTS Dataset, 1995. [16] A. Beck, A. Weissman, D. Lester, and L. Trexler, “The measurement of pes- simism: The hopelessness scale,” Journal of Consulting and Clinical Psychol- ogy, vol. 42, pp. 861–5, 01 1975. [17] B. Schuller, A. Batliner, D. Seppi, S. Steidl, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, N. Amir, L. Kessous, and V. Aharonson, “The relevance of fea- ture type for the automatic classification of emotional user states: Low level descriptors and functionals,” Eighth Annual Conference of the International Speech Communication Association, pp. 2253–2256, 01 2007. [18] M. Abdelwahab and C. Busso, “Evaluation of syllable rate estimation in ex- pressive speech and its contribution to emotion recognition,” in 2014 IEEE Workshop on Spoken Language Technology(SLT), pp. 472–477, 12 2014. [19] M. C. Sezgin, B. Gunsel, and G. K. Kurt, “Perceptual audio features for emo- tion detection,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2012, p. 16, 05 2012. [20] A. Tursunov, S. Kwon, and H.-S. Pang, “Discriminating emotions in the va- lence dimension from speech using timbre features,” Applied Sciences, vol. 9, p. 2470, 06 2019. [21] C. Busso and T. Rahman, “Unveiling the acoustic properties that describe the valence dimension,” in Proc. Interspeech 2012, pp. 1179–1182, 09 2012. [22] P. Boersma and D. Weenink, “Praat: doing phonetics by computer [computer program],” Mar 2023. [23] B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and O. Nieto, “librosa: Audio and music signal analysis in python,” in Proceedings of the 14th Python in Science Conference, vol. 8, 2015. [24] M. Mauch and S. Dixon, “Pyin: A fundamental frequency estimator using probabilistic threshold distributions,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 659–663, 2014. [25] A. Cheveigné and H. Kawahara, “Yin, a fundamental frequency estimator for speech and music,” The Journal of the Acoustical Society of America, vol. 111, pp. 1917–30, 05 2002. [26] C. Ferrand, Speech Science: An Integrated Approach to Theory and Clinical Practice. Allyn & Bacon communication sciences and disorders series, Pear- son, 2014. [27] C. Kim and W. Sung, “Vowel pronunciation accuracy checking system based on phoneme segmentation and formants extraction,” Proceedings of Interna- tional Conference on Speech Processing, pp. 447–452, 08 2001. [28] C. Reuter, “The role of formant positions and micro-modulations in blending and partial masking of musical instruments,” The Journal of the Acoustical Society of America, vol. 126, pp. 2237–2237, 10 2009. [29] J.P.BURG,“Maximumentropyspectralanalysis,”Proceedingsof37thMeet- ing, Society of Exploration Geophysics, 1967. [30] H. Akaike, “Information theory and an extension of the maximum likeli- hood principle,” 2nd International Symposium on Information Theory, vol. 73, pp. 1033–1055, 01 1973. [31] J. Ghosh, M. Delampady, and T. Samanta, An introduction to Bayesian anal- ysis: Theory and methods. Springer, 01 2006. [32] P. Stoica, “Generalized yule-walker equations and testing the orders of multi- variate time series,” International Journal of Control, vol. 37, no. 5, pp. 1159– 1166, 1983. [33] W. E. P. Jr., C. J. Ying, R. L. Moses, and W. M. Steedly, “Accuracy and com- putational comparisons of TLS-Prony, Burg, and FFT-based scattering center extraction algorithms,” in Automatic Object Recognition III (F. A. Sadjadi, ed.), vol. 1960, pp. 140–151, International Society for Optics and Photonics, SPIE, 1993. [34] C. Ferrand, “Harmonics-to-noise ratio: An index of vocal aging,” Journal of Voice : Official Journal of the Voice Foundation, vol. 16, pp. 480–7, 01 2003. [35] P. Boersma, “Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound,” in Proceedings of the insti- tute of phonetic sciences, vol. 17, pp. 97–110, Amsterdam, 1993. [36] K. Pearson, “Note on regression and inheritance in the case of two parents,” Proceedings of the Royal Society of London, vol. 58, pp. 240–242, 1895. [37] 台灣精神醫學會, 美國精神醫學會, and A. P. Association, DSM-5 ®: 精神疾 病診斷準則手冊. 合記圖書出版社, 2014. |