|
1. T. Nose, M. Kanemoto, T. Koriyama, T. Kobayashi, “HMM-based expressive singing voice synthesis with singing style control and robust pitch modeling,” Comput. Speech Lang., vol. 34, no. 1, pp. 308-322, Nov. 2015. 2. E. Molina, I. Barbancho, A. M. Barbancho and L. J. Tardón, “Parametric model of spectral envelope to synthesize realistic intensity variations in singing voice,” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, 2014, pp. 634-638. 3. M. Blaauw, J. Bonada, V. Välimäki, “A neural parametric singing synthesizer modeling timbre and expression from natural songs,” Applied Sciences, 2017. 4. M. Umbert, J. Bonada, M. Goto, T. Nakano and J. Sundberg, “Expression control in singing voice synthesis: features, approaches, evaluation, and challenges,” IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 55-73, Nov. 2015. 5. P.-C. Li, L. Su, Y.-H. Yang and A. W. Y. Su, “Analysis of expressive musical terms in violin using score-informed and expression-based audio features,” In, Proc. Int. Society for Music Information Retrieval Conf., 2015, pp. 809-815. 6. B. Gingras, P. Y. Asselin, and S. McAdams, “Individuality in harpsichord performance: disentangling performer-and piece-specific influences on interpretive choices,” Frontiers in psychology, vol. 4, no. 895, 2013. 7. M. Bernays, and C. Traube, “Investigating pianists' individuality in the performance of five timbral nuances through patterns of articulation, touch, dynamics, and pedaling,” Frontiers in psychology, vol. 5, no. 157, 2014. 8. B. Gingras, T. Lagrandeur-Ponce, B. L. Giordano, and S. McAdams, “Perceiving musical individuality: performer identification is dependent on performer expertise and expressiveness, but not on listener expertise,” Perception, vol. 40, no. 10, pp. 1206-1220, 2011. 9. R. Koren and B. Gingras, “Perceiving individuality in harpsichord performance,” Frontiers in psychology, vol. 5, no. 141, 2014. 10. T. L. New and H. Li, “Exploring vibrato-motivated acoustic features for singer identification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 2, pp. 519-530, 2007. 11. R. Ramirez, E. Maestre, A. Pertusa, E. Gomez and X. Serra, “Performance-based interpreter identification in saxophone audio recordings,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no.3 , pp. 356-364, 2007. 12. M. Schröder, “Emotional speech synthesis: a review,” Proc. Eurospeech, pp. 561-564, 2001. 13. Y. Wang, D. Stanton, Y. Zhang, R. Skerry-Ryan, E. Battenberg, J. Shor, Y. Xiao, F. Ren, Y. Jia and R. A. Saurous, “Style tokens: unsupervised style modeling control and transfer in end-to-end speech synthesis,” In International Conference on Machine Learning, 2018. 14. R. Skerry-Ryan, E. Battenberg, Y. Xiao, Y. Wang, D. Stanton, J. Shor, R. J. Weiss, R. Clark and R. A. Saurous, “Towards end-to-end prosody transfer for expressive speech synthesis with Tacotron,” In International Conference on Machine Learning, 2018. 15. J. Tao, Y. Kang, and A. Li, “Prosody conversion from neutral speech to emotional speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1145-1154, 2006. 16. H. Ming, D. Huang, L. Xie, S. Zhang, M. Dong and H. Li, “Exemplar-based sparse representation of timbre and prosody for voice conversion,” In IEEE International Conference on Acoustics, Speech and Signal Processing, 2016, pp. 5175-5179. 17. L. He and V. Dellwo, “Between-speaker variability in temporal organizations of intensity contours,” The Journal of the Acoustical Society of America, vol. 141, no. 5, pp. 488–494, 2017. 18. V. Dellwo, A. Leemann, and M.-J. Kolly, “Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors,” The Journal of the Acoustical Society of America, vol. 137, no. 3, pp. 1513–1528, 2015. 19. T. Kinnunen and H. Li, “An overview of text-independent speaker recognition: from features to supervectors,” Speech communication, vol. 52, no. 1, pp. 12-40, 2010. 20. S. Mohammadi and A. Kain, “An overview of voice conversion systems,” Speech Communication, vol. 88, no. Supplement C, pp. 65-82, 2017. 21. C. E. Cancino-Chacón, M. Grachten, W. Goebl and G. Widmer, “Computational models of expressive music performance: a comprehensive and critical review,” Frontiers in Digital Humanities, vol. 5, no. 25, 2018. 22. K. Kosta, R. Ramírez, O. F. Bandtlow and E. Chew, “Mapping between dynamic markings and performed loudness: a machine learning approach,” Journal of Mathematics and Music, vol. 10, no. 2, pp. 149-172, 2016. 23. C. E. Cancino-Chacón, T. Gadermaier, G. Widmer and M. Grachten, “An evaluation of linear and non-linear models of expressive dynamics in classical piano and symphonic music,” Machine Learning, vol. 106, no.6, pp. 887-909, 2017. 24. M. Grachten and C. E. Cancino Chacón, “Temporal dependencies in the expressive timing of classical piano performances,” in The routledge companion to embodied music interaction, pp. 360-369, 2017. 25. B. Gingras, P. Y. Asselin and S. McAdams, “Individuality in harpsichord performance: disentangling performer and piece-specific influences on interpretive choices,” Frontiers in psychology, vol. 4, no. 895, 2013. 26. R. Ramirez, E. Maestre, A. Pertusa, E. Gomez and X. Serra, “Performance-based interpreter identification in saxophone audio recordings,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 3, pp. 356-364, 2007. 27. J. Devaney, “Inter-versus intra-singer similarity and variation in vocal performances,” Journal of New Music Research, vol. 45, no. 3, pp. 252-264, 2016. 28. M. Morise, F. Yokomori, and K. Ozawa, “World: a vocoder-based high-quality speech synthesis system for real-time applications,” IEICE Transactions on Information and Systems, vol. E99-D, no. 7, pp. 1877–1884, 2016. 29. K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, “Speech parameter generation algorithms for HMM-based speech synthesis,” In IEEE International Conference on Acoustics, Speech and Signal Processing, 2016, pp. 1315–1318. 30. A. Leemann, M.-J. Kolly, and V. Dellwo, “Speaker-individuality in suprasegmental temporal features: implications for forensic voice comparison,” Forensic Science International, vol. 238, pp. 59–67, 2014. 31. A. G. Adami, R. Mihaescu, D. A. Reynolds and J. J. Godfrey, “Modeling prosodic dynamics for speaker recognition,” In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003, pp. IV-788. 32. T. Kako, Y. Ohishi, H. Kameoka, K. Kashino and K. Takeda, “Automatic identification for singing style based on sung melodic contour characterized in phase plane,” In ISMIR, 2009, pp. 393-398. 33. S. Schweinberger, et al., “Speaker perception,” Wiley Interdisciplinary Reviews: Cognitive Science, vol. 5, no. 1, pp. 15-25, 2014. 34. H. Kuwabara and Y. Sagisak, “Acoustic characteristics of speaker individuality: control and conversion,” Speech Communication, vol. 16, no. 2, pp. 165–173, 1995. 35. Y. Lavner, I. Gath, and J. Rosenhouse, “The effects of acoustic modifications on the identification of familiar voices speaking isolated vowels,” Speech Communication, vol. 30, no. 1, pp. 9–26, 2000. 36. Z. Inanoglu and S. Young, “Data-driven emotion conversion in spoken English,” Speech Communication, vol. 51, no. 3, pp. 268-283, 2009. 37. F. Villavicencio and J. Bonada, “Applying voice conversion to concatenative singing-voice synthesis,” In Eleventh Annual Conference of the International Speech Communication Association, 2010. 38. H. Doi, T. Toda, T. Nakano, M. Goto and S. Nakamura, “Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system,” In Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2012, pp. 1-6. 39. K. Kobayashi, et al., “Voice timbre control based on perceived age in singing voice conversion,” IEICE TRANSACTIONS on Information and Systems, vol. 97, no. 6, pp. 1419-1428, 2014. 40. K. Kobayashi, T. Toda and S. Nakamura, “Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential,” Speech Communication, vol. 99, pp. 211-220, 2018. 41. E. Nachmani and L. Wolf, “Unsupervised singing voice conversion,” arXiv preprint arXiv:1904.06590, 2019. 42. T. Toda, A. W. Black, and K. Tokuda, “Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 8, pp. 2222–2235, 2007. 43. Y. C. Wu, H. T. Hwang, C. C. Hsu, Y. Tsao and H. M. Wang, “Locally linear embedding for exemplar-based spectral conversion,” In Proc. INTERSPEECH, pp. 1652-1656, 2016. 44. L. Ardaillon, G. Degottex, A. Roebel, “A multi-layer f0 model for singing voice synthesis using a b-spline representation with intuitive controls,” In Proc. INTERSPEECH, 2015. 45. X. Chen, W. Chu, J. Guo and N. Xu, “Singing voice conversion with non-parallel data,” In 2019 IEEE Conference on Multimedia Information Processing and Retrieval, 2019, pp. 292-296. 46. H. Silen, E. Helander, J. Nurminen, and M. Gabbouj, “Ways to implement global variance in statistical speech synthesis,” In Proc. INTERSPEECH, pp. 1436–1439, 2012. 47. S. Imai, K. Sumita and C. Furuichi, “Mel log spectrum approximation (MLSA) filter for speech synthesis,” Electronics and Communications in Japan (Part I: Communications), vol. 66, no. 2, pp. 10-18, 1983. 48. Z. M. Smith, B. Delgutte, and A. J. Oxenham, “Chimaeric sounds reveal dichotomies in auditory perception,” Nature, vol. 416, no. 6876, pp. 87-70, 2002. 49. L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5-32, 2001. 50. R.A. Fisher, “The statistical utilization of multiple measurements,” Annals of Eugenics, vol. 8, pp. 376-386, 1938. 51. K. Fukunaga, “Introduction to statistical pattern recognition,” Academic Press, 2013. 52. S. W. Huck, W. H., Cormier, and W. G, Bounds, “Reading statistics and research,” New York: Harper & Row, 1974.
|