|
[1] S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. Acoust., vol. 28, no. 4, pp. 357–366, Aug. 1980. [2] X. Huang, A. Acero, H.-W. Hon, and R. Foreword By-Reddy, Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall PTR, 2001. [3] Z. Tychtl and J. Psutka, “Speech production based on the mel-frequency cepstral coefficients.,” in EuroSpeech, 1999, vol. 99, pp. 2335–2338. [4] B. P. Milner and X. Shao, “Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model,” in 7th International Conference on Spoken Language Processing (ICSLP-2002), 2002, pp. 2421–2424. [5] D. Chazan, R. Hoory, G. Cohen, and M. Zibulski, “Speech reconstruction from mel frequency cepstral coefficients and pitch frequency,” in 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 2000, vol. 3, pp. 1299–1302. [6] X. Shao and B. Milner, “Clean speech reconstruction from noisy mel-frequency cepstral coefficients using a sinusoidal model,” in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03)., 2003, vol. 1, pp. I–704–I–707. [7] B. Milner, “Pitch prediction from MFCC vectors for speech reconstruction,” in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004, vol. 1, pp. I–97–100. [8] X. Shao and B. Milner, “Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction,” J. Acoust. Soc. Am., vol. 118, no. 2, pp. 1134–1143, 2005. [9] B. Milner and X. Shao, “Prediction of fundamental frequency and voicing from mel-frequency cepstral coefficients for unconstrained speech reconstruction,” IEEE Trans. Audio, Speech Lang. Process., vol. 15, no. 1, pp. 24–33, Jan. 2007. [10] J. O. Smith, Spectral Audio Signal Processing, 2011 editi. http://ccrma.stanford.edu/~jos/sasp/. [11] E. Larson and R. Maddox, “Real-time time-domain pitch tracking using wavelets,” Proc. Univ. Illinois Urbana Champaign Res. Exp. Undergraduates Progr., 2005. [12] C. T. Ferrand, “Speech science: An integrated approach to theory and clinical practice,” Ear Hear., vol. 22, no. 6, p. 549, 2001. [13] D. P. W. Ellis, “PLP and RASTA (and MFCC, and inversion) in Matlab.” 2005. [14] 王小川, 語音訊號處理, 修訂二版. 全華圖書, 2008. [15] S. N. Levine and J. O. Smith III, “A sines+ transients+ noise audio representation for data compression and time/pitch scale modifications,” in Audio Engineering Society Convention 105, 1998. [16] R. J. McAulay and T. F. Quatieri, Sinusoidal coding. Defense Technical Information Center, 1995. [17] A. V Oppenheim, R. W. Schafer, J. R. Buck, and others, Discrete-time signal processing, vol. 2. Prentice-hall Englewood Cliffs, 1989. [18] P. Kabal, “An examination and interpretation of ITU-R BS. 1387: Perceptual evaluation of audio quality,” TSP Lab Tech. Report, Dept. Electr. Comput. Eng. McGill Univ., pp. 1–89, 2002. [19] L. Besacier, S. Grassi, A. Dufaux, M. Ansorge, and F. Pellandini, “GSM speech coding and speaker recognition,” in 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 2000, vol. 2, pp. II1085–II1088. [20] I. Recommendation, “Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs “,” ITU-T Recomm., p. 862, 2001.
|