|
[1] S. Takamichi, T. Toda, A. W. Black, and S. Nakamura, “Modulation spectrum-constrained trajectory training algorithm for GMM-based voice conversion,” Proc. ICASSP, pp. 4859–4863, 2015. [2] H.-T. Hwang, Y. Tsao, H.-M. Wang, Y.-R. Wang, and S.-H. Chen, “Incorporating global variance in the training phase of GMM-based voice conversion,” Proc. APSIPA ASC, pp. 1–6, 2013. [3] L.-H. Chen, Z.-H. Ling, L.-J. Liu, and L.-R. Dai, “Voice conversion using deep neural networks with layer-wise generative training,” IEEE/ACM Trans. Audio, Speech, Lang., Process., vol. 22, no. 12, pp. 1859–1872, 2014. [4] C.-C. Hsu, H.-T. Hwang, Y.-C. Wu, Y. Tsao, and H.-M. Wang, “Dictionary update for NMF-based voice conversion using an encoder-decoder network,” Proc. ISCSLP, pp. 1–5, Oct. 2016. [5] D. Erro, A. Moreno, and A. Bonafonte, “Voice conversion based on weighted frequency warping,” IEEE Trans. Audio, Speech, Lang., Process., vol. 18, no. 5, pp. 922–931, Jul. 2010. [6] E. Godoy, O. Rosec, and T. Chonavel, “Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora,” IEEE Trans. Audio, Speech, Lang., Process., vol. 20, no. 4, pp. 1313–1323, May 2012. [7] R. Takashima, T. Takiguchi, and Y. Ariki, “Exemplar-based voice conversion in noisy environment,” Proc. SLT Workshop, pp. 313 – 317, 2012. [8] Z. Wu, T. Virtanen, E. S. Chng, and H. Li, “Exemplar-based sparse representation with residual compensation for voice conversion,” IEEE/ACM Trans. Audio, Speech, Lang., Process., vol. 22, no. 10, pp. 1506–1521, Oct. 2014. [9] Y.-C. Wu, H.-T. Hwang, C.-C. Hsu, Y. Tsao, and H.-M. Wang, “Locally linear embedding for exemplar-based spectral conversion,” Proc. INTERSPEECH, pp. 1652–1656, 2016. [10] Z. Wu, T. Virtanen, T. Kinnunen, E. S. Chng, and H. Li, “Exemplar based voice conversion using non-negative spectrogram deconvolution,” Proc. 8th ISCA Speech Synth. Workshop (SSW8), pp. 201–206, 2013. [11] R. Aihara, T. Takiguchi, and Y. Ariki, “Many-to-one voice conversion using exemplar-based sparse representation,” in Proc. WASPAA, 2015, pp. 1–5. [12] T. Toda, Y. Ohtani, and K. Shikano, “One-to-many and many-to-one voice conversion based on eigenvoices,” in Proc. ICASSP, 2007, pp. 1249–1252. [13] R. Aihara, T. Takiguchi, and Y. Ariki., “Multiple Non-negative Matrix Factorization for Many-to-many Voice Conversion,” IEEE/ACM Trans. on Audio, Speech, and Language Processing, 24(7), pp.1175–1184, 2016. [14] Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, “Many-to-many eigenvoice conversion with reference voice,” in Proc. Interspeech2009, pp. 1623–1626. [15] D. Erro, A. Moreno, and A. Bonafonte, “INCA Algorithm for Training Voice Conversion Systems from Nonparallel Corpora,” IEEE Trans. on Audio, Speech, and Language Processing, 18(5), pp. 944–953, July 2010. [16] P. Song, W. Zheng, and L. Zhao, “Non-parallel Training for Voice Conversion based on Adaptation Method,” in Proc. ICASSP2013. [17] M. Dong, C. Yang, Y. Lu, J. W. Ehnes, D. Huang, H. Ming, R. Tong, S. W. Lee, and H. Li, “Mapping Frames with DNN-HMM Recognizer for Non-parallel Voice Conversion,” in Proc. APSIPA ASC 2015. [18] C. C. Hsu, H. T. Hwang, Y. C. Wu, Y. Tsao and H. M. Wang, "Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder," in Proc. APSIPA ASC 2016. [19] C. C. Hsu, H. T. Hwang, Y. C. Wu, Y. Tsao, and H. M. Wang, "Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks," in Proc. Interspeech2017. [20] H. T. Hwang, Y. C. Wu, Y. H. Peng, C. C. Hsu, Y. Tsao, H. M. Wang, Y. R. Wang, and S. H. Chen, "Voice Conversion Based on Locally Linear Embedding," accepted to appear in Journal of Information Science and Engineering. [21] T. Fujii, R. Aihara, R. Takashima, T. Takiguchi, and Y. Ariki, “Voice Conversion Based on Nonnegative Matrix Factorization in Noisy Environments,” in Proc. 2013 IEEE/SICE International Symposium on System Integration, pp. 495–498. [22] R. Aihara, T. Takiguchi, and Y. Ariki, “Activity-mapping Non-negative Matrix Factorization for Exemplar-based Voice Conversion,” in Proc. ICASSP2015. [23] L. J. P. van der Maaten, E. O. Postma, and H. J. van den Herik. “Dimensionality reduction: A comparative review.” Journal of Machine Learning Research 10.1–41 (2009): 66–71. [24] J.B. Tenenbaum, V. De Silva, and J.C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000. [25] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering,” Advances in neural information processing systems, vol. 14, pp. 585–591, 2001. [26] S.T. Roweis and L.K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000. [27] L.K. Saul and S.T. Roweis, “An introduction to locally linear embedding,” (2001) Available from https://www.cs.nyu.edu/~roweis/lle/papers/lleintro.pdf. [28] K. Tokuda, T. Kobayashi, and S. Imai, “Speech parameter generation from HMM using dynamic features,” Proc. ICASSP, pp. 660–663, 1995. [29] T. Toda, A.W. Black, and K. Tokuda, “Voice Conversion Based on Maximum Likelihood Estimation of Spectral Parameter Trajectory,” IEEE Trans. Audio, Speech, and Language Processing, 15(8), pp. 2222-2235, November 2007. [30] H. Sil´ en, E. Helander, J. Nurminen, and M. Gabbouj, “Ways to implement global variance in statistical speech synthesis,” Proc. INTERSPEECH, pp. 1436–1439, 2012. [31] H. Kawahara, I. Masuda-Katsuse, and A. de Cheveign´ e, “Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds,” Speech Commun., no. 3-4, pp. 187–207, 1999. [32] H. Kawahara, J. Estill and O. Fujimura, “Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT,” MAVEBA, September 13-15, 2001 [33] T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, “An adaptive algorithm for mel-cepstral analysis of speech,” in Proc. ICASSP, 1992, pp. 137–140. [34] K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, “Mel-generalized cepstral analysis — a unified approach to speech spectral estimation,” Proc. ICSLP, pp.1043–1046, Sep. 1994. [35] Y. C. Wu, H. T. Hwang, S. S. Wang, C. C. Hsu, Y. H. Lai, Y. Tsao, and H. M. Wang, "A Locally Linear Embedding Based Postfiltering Approach for Speech Enhancement," in Proc. ICASSP 2017. [36] Y. C. Wu, H. T. Hwang, S. S. Wang, C. C. Hsu, Y. Tsao, and H. M. Wang, "A Post-filtering Approach Based on Locally Linear Embedding Difference Compensation for Speech Enhancement," in Proc. Interspeech2017. [37] Z. Zhang, J. Wang, “MLLE: Modified Locally Linear Embedding Using Multiple Weights,” Proceedings of the 19th International Conference on Neural Information Processing Systems, pp. 1593-1600, Dec. 2006 [38] C.-Y. Tseng, Y.-C. Cheng, and C.-H. Chang, “Sinica COSPRO and toolkit– corpora and platform of Mandarin Chinese fluent speech,” Proc. Oriental COCOSDA, pp. 23–28, 2005. [39] T. Toda, L. Chen, D. Saito, F. Villavicencio, M. Wester, Z. Wu, and J. Yamagishi, “The voice conversion challenge 2016,” Proc. INTERSPEECH, 2016.
|