|
1. H. Kenmochi, “Vocaloid and hatsune miku phenomenon in japan,” in Interdisciplinary Workshop on Singing Voice, 2010. 2. K. Hua, “Modeling singing f0 with neural network driven transition-sustain models,” ArXiv, vol. abs/1803.04030, 2018. 3. M. Morise, “Harvest: A high-performance fundamental frequency estimator from speech signals,” in INTERSPEECH 2017, pp. 2321–2325, Aug 2017. 4. Y. Hono, K. Hashimoto, K. Oura, Y. Nankaku, and K. Tokuda, “Singing voice synthesis based on generative adversarial networks,” in 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6955–6959, May 2019. 5. P. Lu, J. Wu, J. Luan, X. Tan, and L. Zhou, “Xiaoicesing: A high-quality and integrated singing voice synthesis system,” arXiv preprint arXiv:2006.06261,2020. 6.J. Lee, H.-S. Choi, C.-B. Jeon, J. Koo, and K. Lee, “Adversarially trained end-to-end Korean singing voice synthesis system,” in INTERSPEECH 2019,pp. 2588–2592, Sep 2019. 7. P. Chandna, M. Blaauw, J. Bonada, and E. Gómez, “WGANSing: A multi-voice singing voice synthesizer based on the Wasserstein-GAN,” ArXiv, vol. abs/1903.10729, 2019. 8. Y. Gu, X. Yin, Y. Rao, Y. Wan, B. Tang, Y. Zhang, J. Chen, Y. Wang, and Z. Ma, “Bytesing: A chinese singing voice synthesis system using duration allocated encoder-decoder acoustic models and wavernn vocoders,” arXiv preprint arXiv:2004.11012, 2020. 9. H. Kenmochi and H. Ohshita, “Vocaloid-commercial singing synthesizer based on sample concatenation,” in Eighth Annual Conference of the International Speech Communication Association, 2007. 10. T. Nakano and M. Goto, “Vocalistener: A singing-to-singing synthesis system based on iterative parameter estimation,”Proceedings of Sound and Music Computing (SMC), pp. 343–348, Jan 2009. 11. T. Nakano and M. Goto, “Vocalistener2: A singing synthesis system able to mimic a user’s singing in terms of voice timbre changes as well as pitch and dynamics,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 453–456, May 2011. 12. K. Saino, H. Zen, Y. Nankaku, A. Lee, and K. Tokuda, “HMM-based singing voice synthesis system,” in Proceedings International Conference on Spoken Language Processing (ICSLP), pp. 2274–2277, 2006. 13. H.-Y. Gu and H.-L. Liau, “Mandarin singing voice synthesis using an HMM based scheme,” Journal of Information Science and Engineering - JISE, vol. 27, pp. 347–351, Jun 2008. 14. X. Li and Z. Wang, “A HMM-based mandarin chinese singing voice synthesis system,” IEEE/CAA Journal of Automatica Sinica, vol. 3, pp. 192–202, April 2016. 15. K. Nakamura, K. Oura, Y. Nankaku, and K. Tokuda, “HMM-based singing voice synthesis and its application to Japanese and English,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 265–269, May 2014. 16. K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, “Speech parameter generation algorithms for hmm-based speech synthesis,” in 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 00CH37100), vol. 3, pp. 1315–1318, IEEE, 2000. 17. M. Nishimura, K. Hashimoto, K. Oura, Y. Nankaku, and K. Tokuda, “Singing voice synthesis based on deep neural networks,” in INTERSPEECH 2016, pp. 2478–2482, Sep 2016. 18. J. Kim, H. Choi, J. Park, M. Hahn, S. Kim, and J. Kim, “Korean singing voice synthesis based on an LSTM recurrent neural network,” in INTERSPEECH 2018, pp. 1551–1555, Sep 2018. 19. M. Morise, F. Yokomori, and K. Ozawa, “World: A vocoder-based high-quality speech synthesis system for real-time applications,” IEICE Transactions, vol. 99-D, pp. 1877–1884, 2016. 20. Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, et al., “Tacotron: Towards end-to-end speech synthesis,” arXiv preprint arXiv:1703.10135, 2017. 21. J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerrv-Ryan, et al., “Natural tts synthesis by conditioning wavenet on mel spectrogram predictions,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779–4783, IEEE, 2018. 22. J. Uszkoreit, “Transformer: A novel neural network architecture for language understanding,” Google AI Blog, vol. 31, 2017. 23. N. Kalchbrenner, E. Elsen, K. Simonyan, S. Noury, N. Casagrande, E. Lock-hart, F. Stimberg, A. v. d. Oord, S. Dieleman, and K. Kavukcuoglu, “Efficient neural audio synthesis,” arXiv preprint arXiv:1802.08435, 2018. 24. K. Kumar, R. Kumar, T. de Boissiere, L. Gestin, W. Z. Teoh, J. Sotelo, A. de Brébisson, Y. Bengio, and A. C. Courville, “Melgan: Generative adversarial networks for conditional waveform synthesis,” in Advances in Neural Information Processing Systems, pp. 14910–14921, 2019. 25. 李依哲, “基於雙向時間遞歸神經網路之中文歌聲合成,” 2019. 國立清華大學電機工程學系碩士論文, https://hdl.handle.net/11296/yf8qks. 26. M. o. E. Organized by Department of Lifelong Education, The Manual of the Phonetic Symbols of Mandarin Chinese (Digital Version). Ministry of Education, 1 ed., Jan 2017. 27. M. Morise, “CheapTrick, a spectral envelope estimator for high-quality speech synthesis,” Speech Communication, vol. 67, pp. 1–7, 2015. 28. M. Morise, “D4C, a band-aperiodicity estimator for high-quality speech synthesis,” Speech Communication, vol. 84, pp. 57–65, 2016. 29. K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, “Mel-generalized cepstral analysis,” in Proceedings International Conference on Spoken Language Processing (ICSLP), pp. 1043–1046, 1994. 30. H. Zen, T. Toda, M. Nakamura, and K. Tokuda, “Details of the nitech hmm-based speech synthesis system for the blizzard challenge 2005,” IEICE Transactions, vol. 90-D, pp. 325–333, 01 2007. 31. T. Saitou, M. Goto, M. Unoki, and M. Akagi, “Speech-to-singing synthesis: Converting speaking voices to singing voices by controlling acoustic features unique to singing voices,” in 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 215–218, IEEE, 2007. 32. L. Ardaillon, G. Degottex, and A. Roebel, “A multi-layer f0 model for singing voice synthesis using a b-spline representation with intuitive controls,” in INTERSPEECH 2015, Sep 2015. 33. R. Maher and J. Beauchamp, “An investigation of vocal vibrato for synthesis,” Applied Acoustics, vol. 30, no. 2-3, pp. 219–245, 1990. 34. D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” International Conference on Learning Representations, Dec 2014. 35. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, pp. 5998–6008, 2017. |