|
[1]MalcolmSlaney,MicheleCovell,andBudLassiter. Automaticaudiomorphing. pages1001 – 1004 vol. 2, 06 1996. ISBN 0-7803-3192-3. doi: 10.1109/ICASSP.1996.543292.[2]Satoshi Imai, Kazuo Sumita, and Chieko Furuichi. Mel log spectrum approximation(mlsa) filter for speech synthesis.Electronics and Communications in Japan (Part I:Communications), 66(2):10–18, 1983.[3]Hideki Kawahara. Straight, exploitation of the other aspect of vocoder: Perceptuallyisomorphic decomposition of speech sounds.Acoust Sci Technol, 27349, 01 2006.doi: 10.1250/ast.27.349.[4]Masanori Morise, Fumiya YOKOMORI, and Kenji Ozawa. World: A vocoder-basedhigh-quality speech synthesis system for real-time applications.IEICE Transactionson Information and Systems, E99.D:1877–1884, 07 2016. doi: 10.1587/transinf.2015EDP7457.[5]AkiraTamamori,TomokiHayashi,KazuhiroKobayashi,KazuyaTakeda,andTomoki26 Toda. Speaker-dependent wavenet vocoder. pages 1118–1122, 08 2017. doi: 10.21437/Interspeech.2017-314.[6]TomokiHayashi,AkiraTamamori,KazuhiroKobayashi,KazuyaTakeda,andTomokiToda. Aninvestigationofmulti-speakertrainingforwavenetvocoder. pages712–718,12 2017. doi: 10.1109/ASRU.2017.8269007.[7]D. G. Childers, D. P. Skinner, and R. C. Kemerait. The cepstrum: A guide toprocessing.Proceedings of the IEEE, 65(10), Oct 1977. ISSN 1558-2256. doi:10.1109/PROC.1977.10747.[8]S. S. Stevens, J. Volkmann, and E. B. Newman. A scale for the measurement of thepsychological magnitude pitch.The Journal of the Acoustical Society of America, 8(3):185–190, 1937. doi: 10.1121/1.1915893. URLhttps://doi.org/10.1121/1.1915893.[9]Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, NavdeepJaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, YannisAgiomyrgiannakis, Rob Clark, and Rif A. Saurous. Tacotron: Towards end-to-endspeech synthesis. 2017. URLhttps://arxiv.org/abs/1703.10135.[10]JonathanShen, RuomingPang, RonWeiss, MikeSchuster, NavdeepJaitly, ZonghengYang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Rj Skerrv-Ryan, Rif Saurous, YannisAgiomvrgiannakis, and Yonghui Wu. Natural tts synthesis by conditioning waveneton mel spectrogram predictions. pages 4779–4783, 04 2018. doi: 10.1109/ICASSP.2018.8461368.27 [11]Arturo Camacho and John Harris. A sawtooth waveform inspired pitch estimator forspeech and music.The Journal of the Acoustical Society of America, 124:1638–52,10 2008. doi: 10.1121/1.2951592.[12]John Kominek and Alan Black. The cmu arctic speech databases.SSW5-2004, 012004. |