|
[1] P. Comon and C. Jutten, eds. "Handbook of Blind Source Separation: Independent component analysis and applications," Academic press, 2010. [2] T. W. Lee, "Independent component analysis," in Independent Component Analysis. Berlin, Germany: Springer, 1998, pp. 27–66. [3] A. Hyvärinen and E. Oja, " Independent component analysis: Algorithms and applications," Neural networks, vol. 13, no. 4, pp. 411–430, 2000. [4] A. Hyvärinen, J. Karhunen, and E. Oja, "Independent Component Analysis," New York: Wiley, 2001. [5] A. Cichocki, R. Zdunek, A. H. Phan, and S. I. Amari, "Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation," Hoboken, NJ. USA: Wiley, 2009. [6] J. R. Hershey, Z. Chen, J. Le Roux and S. Watanabe, "Deep clustering: Discriminative embeddings for segmentation and separation," 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 31-35. [7] Y. Luo and N. Mesgarani, "Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 8, pp. 1256-1266, Aug. 2019. [8] Y. Liu and D. Wang, "Divide and Conquer: A Deep CASA Approach to Talker-Independent Monaural Speaker Separation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 12, pp. 2092-2102, Dec. 2019. [9] Y. Luo, Z. Chen and T. Yoshioka, "Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 46-50. [10] C. Subakan, M. Ravanelli, S. Cornell, M. Bronzi and J. Zhong, "Attention Is All You Need In Speech Separation," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 21-25. [11] D. Yu, M. Kolbæk, Z. -H. Tan and J. Jensen, "Permutation invariant training of deep models for speaker-independent multi-talker speech separation," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 241-245. [12] M. Kolbæk, D. Yu, Z. -H. Tan and J. Jensen, "Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 10, pp. 1901-1913, Oct. 2017. [13] T. v. Neumann, K. Kinoshita, M. Delcroix, S. Araki, T. Nakatani and R. Haeb-Umbach, "All-neural Online Source Separation, Counting, and Diarization for Meeting Analysis," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 91-95. [14] K. Kinoshita, M. Delcroix, S. Araki and T. Nakatani, "Tackling Real Noisy Reverberant Meetings with All-Neural Source Separation, Counting, and Diarization System," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 381-385. [15] Y. Zhang et al., "Continuous Speech Separation with Recurrent Selective Attention Network," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6017-6021. [16] T. Yoshioka et al., "Advances in Online Audio-Visual Meeting Transcription," 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019, pp. 276-283. [17] H. Taherian, K. Tan and D. Wang, "Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2791-2800, 2022. [18] P. Wang et al., "Speech Separation Using Speaker Inventory," 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019, pp. 230-236. [19] N. Kanda, Y. Gaur, X. Wang, Z. Meng, Z. Chen, T. Zhou and T. Yoshioka, "Joint speaker counting, speech recognition, and speaker identification for overlapped speech of any number of speakers," in Proceedings of the Annual Conference of the International Speech Communication Association, 2020, pp. 36–40. [20] X. Anguera, S. Bozonnet, N. Evans, C. Fredouille, G. Friedland and O. Vinyals, "Speaker Diarization: A Review of Recent Research," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 2, pp. 356-370, Feb. 2012. [21] M. Delcroix, K. Zmolikova, T. Ochiai, K. Kinoshita and T. Nakatani, "Speaker Activity Driven Neural Speech Extraction," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6099-6103. [22] B. Laufer-Goldshtein, R. Talmon and S. Gannot, "Diarization and Separation Based on a Data-Driven Simplex," 2018 26th European Signal Processing Conference (EUSIPCO), 2018, pp. 842-846. [23] B. Laufer-Goldshtein, R. Talmon and S. Gannot, "Source Counting and Separation Based on Simplex Analysis," in IEEE Transactions on Signal Processing, vol. 66, no. 24, pp. 6458-6473, 15 Dec.15, 2018. [24] B. Laufer-Goldshtein, R. Talmon and S. Gannot, "Global and Local Simplex Representations for Multichannel Source Separation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 914-928, 2020. [25] B. Laufer-Goldshtein, R. Talmon and S. Gannot, "Audio source separation by activity probability detection with maximum correlation and simplex geometry," EURASIP Journal on Audio, Speech, and Music Processing, 2021, 2021.1:1-16. [26] Hsu Yicheng, and Mingsian Bai. "Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence." (2023). [27] J.B. Allen and D.A. Berkley, "Image method for efficiently simulating small-room acoustics, " Journal Acoustic Society of America, 1979, pp. 943. [28] E. Hadad, F. Heese, P. Vary and S. Gannot, "Multichannel audio database in various acoustic environments," 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), 2014, pp. 313-317. [29] J. G. Fiscus, J. Ajot, M. Michel, and J. S. Garofolo, "The rich transcription 2006 spring meeting recognition evaluation," International Workshop on Machine Learning and Multimodal Interaction, NIST, 2006, pp. 309–322. [30] C. H. Taal, R. C. Hendriks, R. Heusdens, and J, Jensen, "An algorithm for intelligibility prediction of time-frequency weighted noisy speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2125-2136, 2011. [31] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," in Proc. IEEE ICASSP, Salt Lake City, Utah, USA, 2001, pp. 749-752. [32] E. Vincent, R. Gribonval, and C. F´evotte, “Performance measurement in blind audio source separation,” IEEE transactions on audio, speech, and language processing, vol. 14, no. 4, pp. 1462–1469, 2006. [33] A.C. Morris, V. Maier, and P. Green. “From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition.” Eighth International Conference on Spoken Language Processing, 2004. [34] G. Sell and D. Garcia-Romero, “Speaker diarization with plda i-vector scoring and unsupervised calibration,” 2014 IEEE Spoken Language Technology Workshop (SLT), 2014, pp. 413-417. [35] S. H. Shum, N. Dehak, R. Dehak and J. R. Glass, “Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach, ” in IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2015-2028, Oct. 2013. [36] Y. Fujita, N. Kanda, S. Horiguchi, K. Nagamatsu, and S. Watanabe, “End-to-end neural speaker diarization with permutation-free objectives,” in Proc. Interspeech, 2019, pp. 4300–4304. [37] Y. Fujita, N. Kanda, S. Horiguchi, Y. Xue, K. Nagamatsu and S. Watanabe, “End-to-End Neural Speaker Diarization with Self-Attention,” 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019, pp. 296-303. [38] O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via timefrequency masking,” IEEE Trans. Signal Process,” vol. 52, no. 7, pp. 1830– 1847, Jul. 2004. [39] B. C. F. Moore. “An introduction to the psychology of hearing. Brill,” 2012. [40] M.R. Bai, C. Kuo, “ Acoustic source localization and deconvolution-based separation,” Journal of Computational Acoustics, 2015. [41] K. Tan and D. Wang, “A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement,” in Proc. Interspeech, 2018, pp. 3229-3233. [42] A. C. Morris, J. Barker, and H. Bourlard, “From missing data to maybe useful data: Soft data modelling for noise robust ASR,” in Proc. Workshop Upon Innovation in Speech Processing, Stratford-Upon-Avon, Apr. 2001, pp. 153–164. [43] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: an ASR corpus based on public domain audio books,” in Proc. IEEE ICASSP, South Brisbane, Australia, 2015, pp. 5206-5210. [44] O. Cetin and E. Shriberg, "Speaker Overlaps and ASR Errors in Meetings: Effects Before, During, and After the Overlap," 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, France, 2006, pp. I-I. [45] N. Turpault, R. Serizel, A. P. Shah, and J. Salamon. “Sound event detection in domestic environments with weakly labeled data and soundscape synthesis.” In Workshop on Detection and Classification of Acoustic Scenes and Events, 2019, October. [46] M. Ravanelli, and T. Parcollet et al. "SpeechBrain: A general-purpose speech toolkit." 2021.
|