|
[1] P. S. Huang, S. D. Chen, P. Smaragdis, and M. Hasegawa-Johnson, “Singing-voice separation from monaural recordings using robust prin-cipal component analysis,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2012, pp. 57–60. [2] P. Sprechmann, A. Bronstein, and G. Sapiro, “Real-time online singing voice separation from monaural recordings using robust low-rank modeling,” in Proc. 13th Int. Soc. Music Inf. Retrieval (ISMIR), 2012. [3] A. L. Maas, Q. V. Le, T. M. O’Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, “Recurrent neural networks for noise reduction in robust ASR,” in Proc. 13th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), 2012, pp. 22–25. [4] B. Chen, C. Chen, and J. Wang, "Smart Homecare Surveillance System: Behavior Identification Based on State-Transition Support Vector Machines and Sound Directivity Pattern Analysis," in IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 43, no. 6, pp. 1279-1289, Nov. 2013. [5] N. Yalta, K. Nakadai, and T. Ogata, “Sound source localization using deep learning models,” in Journal of Robotics and Mechatronics, vol. 29, no. 1, 2017. [6] C. Grobler, C. Kruger, B. Silva, and G. Hancke, “Sound based localization and identification in industrial environments,” in IEEE Industrial Electronics Society (IECON), 2017. [7] S. Haykin, “Adaptive Filter Theory, 3rd ed.,” NJ: Prentice-Hall, 1996. [8] M. Miyoshi and Y. Kaneda, Inverse filtering of room acoustics, IEEE Trans. Acoustics, Speech, and Signal Processing, 36(2), 145-152, 1988. [9] I. Kodrasi, S. Goetze, and S. Docolo, Regularization for Partial Multichannel Equalization for Speech Dereverberation, IEEE Trans. Audio, Speech, and Language Processing, 21(9), 1879-1890, 2013. [10] J. Benesty, J. Chen, and Y. Huang, Microphone Array Signal Processing, Chap.7, Springer, Berlin, 2008. [11] I Kodrasi, S Doclo, Late Reverberation Power Spectral Density Estimation Based on An Eigenvalue Decomposition, Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), New Orleans, 5 pages, 2017. [12] K. Kinoshita, T. Nakatani, and M. Miyoshi, Suppression of late rever- beration effect on speech signal using long-term multiple-step linear prediction, IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 4, pp. 534–545, 2009. [13] Tomohiro Nakatani, Takuya Yoshiok, Keisuke Kinoshita, Masato Miyoshi and Biing-Hwang Juang, Speech Dereverberation Based on Variance- Normalized Delayed Linear Prediction, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, 2010. [14] Teng Xiang, Jing Lu and Kai Chen, Multi-channel Adaptive Dereverberation Tracing Abrupt Position Change of Target Speaker, 2018. [15] Han, Y Wang, DL Wang, et al, Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans. Audio Speech Lang. Process. 23(6), 982–992, 2015. [16] K Han, Y Wang, DL Wang, in ICASSP. Learning spectral mapping for speech dereverberation, 2014. [17] B Wu, K Li, ML Yang, C-H Lee, A reverberation-time-aware approach to speech dereverberation based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(1), 102–11, 2017. [18] Wu, Bo & Yang, Minglei & Li, Kehuang & Huang, Zhen & Siniscalchi, Marco & Wang, Tong & Lee, Chin-Hui, A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation, EURASIP Journal on Advances in Signal Processing, 2017. [19] Rec. P. 862 ITU-T, Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Int. Telecommun. Union-Telecommun. Stand. Sector, 2001. [20] J. Wang, J. Wang, K. He and, C. Hsu, "Environmental Sound Classification using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor," The 2006 IEEE International Joint Conference on Neural Network Proceedings, Vancouver, BC, 2006, pp. 1731-1735. [21] M. V. S. Shashanka, and P. Smaragdis, "Secure Sound Classification: Gaussian Mixture Models," 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, 2006, pp. III-III. [22] Y. Peng, C. Lin, M. Sun, and K. Tsai, "Healthcare audio event classification using Hidden Markov Models and Hierarchical Hidden Markov Models," 2009 IEEE International Conference on Multimedia and Expo, New York, NY, 2009, pp. 1218-1221. [23] O. Gencoglu, T. Virtanen, and H. Huttunen, "Recognition of acoustic events using deep neural networks," 2014 22nd European Signal Processing Conference (EUSIPCO), Lisbon, 2014, pp. 506-510. [24] E. Cakir, T. Heittola, H. Huttunen and T. Virtanen, "Polyphonic sound event detection using multi label deep neural networks," 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, 2015, pp. 1-7. [25] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, F. Li, “Large-scale Video Classification with Convolutional Neural Networks,” The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1725-1732. [26] K. J. Piczak, "Environmental sound classification with convolutional neural networks," 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Boston, MA, 2015, pp. 1-6. [27] J. Salamon and J. P. Bello, "Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification," in IEEE Signal Processing Letters, vol. 24, no. 3, pp. 279-283, March 2017. [28] S. Adavanne, A. Politis, J. Nikunen, and T. Virtanen, "Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks," in IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 1, pp. 34-48, March 2019. [29] S. Adavanne, P. Pertila, and T. Virtanen, “Sound event detection using spatial features and convolutional recurrent neural network,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017. [30] I.-Y. Jeong, S. Lee1, Y. Han, and K. Lee, “Audio event detection using multiple-input convolutional neural network,” in Detection and Classification of Acoustic Scenes and Events (DCASE), 2017. [31] J. Zhou, “Sound event detection in multichannel audio LSTM network,” in Detection and Classification of Acoustic Scenes and Events (DCASE), 2017. [32] S. Adavanne, A. Politis and T. Virtanen, "Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features," 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, 2018, pp. 1-7. [33] C.W. Groetsch, “The theory of Tikhonov regularization for Fredholm equation of the first kind,” Pitman Advanced Pub. Program, Boston, 1984. [34] M. Kleinsteuber and H. Shen, "Blind Source Separation With Compressively Sensed Linear Mixtures," in IEEE Signal Processing Letters, vol. 19, no. 2, pp. 107-110, Feb. 2012. [35] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109–1121, Dec. 1984. [36] P. Huang, S. D. Chen, P. Smaragdis and M. Hasegawa-Johnson, "Singing-voice separation from monaural recordings using robust principal component analysis," 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, 2012, pp. 57-60. [37] B. Mijovic, M. De Vos, I. Gligorijevic, J. Taelman and S. Van Huffel, "Source Separation From Single-Channel Recordings by Combining Empirical-Mode Decomposition and Independent Component Analysis," in IEEE Transactions on Biomedical Engineering, vol. 57, no. 9, pp. 2188-2196, Sept. 2010. [38] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, Oct. 1999. [39] K. W. Edward Lin, Balamurali B.T., E. Koh, S. Lui, D. Herremans, “Singing Voice Separation Using a Deep Convolutional Neural Network Trained by Ideal Binary Mask and Cross Entropy," arXiv:1812.01278, 2018. [40] M. Marius, J. Janer and E. Gómez, “Monaural Score-Informed Source Separation for Classical Music Using Convolutional Neural Networks.” ISMIR, 2017. [41] S. Mobin, B. Cheung, and B. Olshausen, “Generalization Challenges for Neural Architectures in Audio Source Separation," arXiv:1803.08629, 2018. [42] P. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, "Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 12, pp. 2136-2147, Dec. 2015. [43] F. Weninger, J. R. Hershey, J. Le Roux and B. Schuller, "Discriminatively trained recurrent neural networks for single-channel speech separation," 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Atlanta, GA, 2014, pp. 577-581. [44] F. Weninger, F. Eyben and B. Schuller, "Single-channel speech separation with memory-enhanced recurrent neural networks," 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, 2014, pp. 3709-3713. [45] Y. Xu, J. Du, L. R. Dai, and C.-H. Lee, “A regression approach to speech enhancement based on deep neural networks,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 1, pp. 7–19, Jan. 2015. [46] B. Wu, M. Yang, K. Li, Z. Huang, M. Siniscalchi, T. Wang, and C. H. Lee, “A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation,” EURASIP Journal on Advances in Signal Processing. 2017. [47] F. Weninger, F. Eyben, and B. Schuller, “Single-channel speech separation with memory-enhanced recurrent neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2014, pp.3709–3713. [48] D. Wang, “On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis,” In P. Divenyi (Ed.), “Speech Separation by Humans and Machines,” Chapter 12, pp. 181-197, Kluwer Academic, Norwell MA, 2005. [49] Y. Wang, A. Narayanan, and D. Wang, "On Training Targets for Supervised Speech Separation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 12, pp. 1849-1858, Dec. 2014 [50] D. Wang and J. Lim, “The unimportance of phase in speech enhancement,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-30, no. 4, pp. 679–681, Aug. 1982. [51] K. Paliwal, K. Wójcicki, and B. Shannon, “The importance of phase in speech enhancement,” Speech Commun., vol. 53, no. 4, pp. 465–494, 2011. [52] H. Erdogan, J. R. Hershey, S. Watanabe, and J. L. Roux, “Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 708–712. [53] A. Souloumiac, "Blind source detection and separation using second order non-stationarity," 1995 International Conference on Acoustics, Speech, and Signal Processing, Detroit, MI, USA, 1995, pp. 1912-1915 vol.3. [54] Doukas, N. Naylor, P. Stathaki, Tania, "Voice activity detection using source separation techniques", In EUROSPEECH, 1997. [55] Y. Li, K. C. Ho, and M. Popescu, "Efficient Source Separation Algorithms for Acoustic Fall Detection Using a Microsoft Kinect," in IEEE Transactions on Biomedical Engineering, vol. 61, no. 3, pp. 745-755, March 2014. [56] Heittola, T. Mesaros, A. Virtanen, T. Eronen, Antti, "Sound event detection in multisource environments using source separation", In CHiME-2011, 2011. [57] Q. Kong, Y. Xu, I. Sobieraj, W. Wang, and M. D. Plumbley, “Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data,” IEEE Trans. Acoust., Speech, Signal Process., Vol 27 Issue 4, pp. 777-787, April 2019. [58] S. Braun and E. A. P. Habets, Dereverberation in noisy environments using reference signals and a maximum likelihood estimator, in Proc. European Signal Processing Conference, Marrakech, Morocco, 2013. [59] A. Kuklasi´nski, S. Doclo, S. H. Jensen, and J. Jensen, Maximum likelihood based multi-channel isotropic reverberation reduction for hearing aids, in Proc. European Signal Processing Conference, Lisbon, Portugal, 2014. [60] D. Gesbert and P. Duhamel, Robust blind channel identification and equalization based on multi-step predictors, in Proc. ICASSP’97, 1997. [61] D. Yu, L. Deng, Deep Neural Networks. In: Automatic Speech Recognition. Signals and Communication Technology. Springer, London, 2015. [62] Allen, J and Berkley, D. 'Image Method for efficiently simulating small‐room acoustics'. The Journal of the Acoustical Society of America, Vol 65, No.4, pp. 943‐950, 1978. [63] A. Mohamed, G. Dahl, and G. Hinton, Deep belief networks for phone recognition, in Proc. NIPS Workshop Deep Learn. Speech Recognition Related Applicat., 2009. [64] A. Mohamed, D. Yu, and L. Deng, “Investigation of full-sequence training of deep belief networks for speech recognition,” in Proc. Interspeech, pp. 2846–2849, 2010. [65] L. Deng, D. Yu, and J. Platt, Scalable stacking and learning for building deep architectures, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., pp. 2133–2136, 2012. [66] O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn and D. Yu, Convolutional Neural Networks for Speech Recognition, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 10, pp. 1533-1545, Oct. 2014. [67] J. F. Gemmeke et al., Audio Set: An ontology and human-labeled dataset for audio events, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, pp. 776-780, 2017. [68] S. Hershey et al., CNN architectures for large-scale audio classification, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, pp. 131-135, 2017. [69] S. Liu and W. Deng, Very deep convolutional neural network based image classification using small training sample size, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, pp. 730-734, 20145. [70] Zeiler M.D., Fergus R, Visualizing and Understanding Convolutional Networks. In: Fleet D., Pajdla T., Schiele B., Tuytelaars T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8689. Springer, Cham, 2014. [71] Krizhevsky, Alex & Sutskever, Ilya & E. Hinton, Geoffrey, “ImageNet Classification with Deep Convolutional Neural Networks,” Neural Information Processing Systems. 25, 10.1145/3065386, 2012. [72] K. Cho, B. V. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning Phrase Representations Using RNN Encoder-decoder for Statistical Machine Translation,” arXiv preprint arXiv:1406.1078, 2014. [73] D. Wang and J. Lim, “The unimportance of phase in speech enhancement,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-30, no. 4, pp. 679–681, Aug. 1982. [74] K. Paliwal, K. Wójcicki, and B. Shannon, “The importance of phase in speech enhancement,” Speech Commun., vol. 53, no. 4, pp. 465–494, 2011. [75] H. Erdogan, J. R. Hershey, S. Watanabe, and J. L. Roux, “Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 708–712. [76] J. L. Roux, N. Ono, and S. Sagayama, “Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction,” in Proc. ISCA Statist. Perceptual Audition, 2008, pp. 23–28. [77] E. Vincent, R. Gribonval, and C. Févotte, “Performance measurement in blind audio source separation,” IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2006, 14 (4), pp.1462–1469. [78] A. Mesaros, T. Heittola, and T. Virtanen, “Metrics for polyphonic sound event detection,” in Applied Sciences, vol. 6, no. 6, 2016. [79] A. Mesaros, T. Heittola, and D. Ellis, “Datasets and evaluation,” in Computational Analysis of Sound Scenes and Events, T. Virtanen, M. Plumbley, and D. Ellis, Eds. Springer International Publishing, 2018, ch. 6. [80] J. Lee and H. Kang, "A joint learning algorithm for complex-valued T-F masks in deep learning-based single-channel speech enhancement systems," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 6, pp. 1098-1108, June 2019 [81] J. Allen and D. Berkley, “Image method for efficiently simulating small‐room acoustics,” The Journal of the Acoustical Society of America, Vol 65, No.4, pp. 943‐950, 1978
|