|
[1] C.-Y. Koh, “Weakly supervised sound event detection model training strategies,” Master’s thesis, National Tsing Hua University, 2021. https://hdl.handle.net/11296/d749hu. [2] A. Mesaros, T. Heittola, T. Virtanen, and M. D. Plumbley, “Sound event detection: A tutorial,” IEEE Signal Processing Magazine, vol. 38, no. 5, pp. 67–83, 2021. [3] T. K. Chan and C. S. Chin, “A comprehensive review of polyphonic sound event detection,” IEEE Access, vol. 8, pp. 103339–103373, 2020. [4] N. Turpault, R. Serizel, A. P. Shah, and J. Salamon, “Sound event detection in domestic environments with weakly labeled data and soundscape synthesis,” in Workshop on Detection and Classification of Acoustic Scenes and Events, 2019. [5] N. K. Kim and H. K. Kim, “Polyphonic sound event detection based on residual convolutional recurrent neural network with semi-supervised loss function,” IEEE Access, vol. 9, pp. 7564–7575, 2021. [6] H. Endo and H. Nishizaki, “Peer collaborative learning for polyphonic sound event detection,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 826–830, 2022. [7] L. Yang, J. Hao, Z. Hou, and W. Peng, “Two-stage domain adaptation for sound event detection.,” in DCASE, pp. 230234, 2020. [8] X. Zheng, Y. Song, L.-R. Dai, I. McLoughlin, and L. Liu, “An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection,” in Proc. Interspeech 2021, pp. 556–560, 2021. [9] W. Wei, H. Zhu, E. Benetos, and Y. Wang, “A-crnn: A domain adaptation model for sound event detection,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 276280, 2020. [10] D. Stowell, M. Wood, Y. Stylianou, and H. Glotin, “Bird detection in audio: A survey and a challenge,” in 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6, 2016. [11] D. Stowell and M. D. Plumbley, “Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning,” PeerJ, vol. 2, p. e488, 2014. [12] C.-Y. Koh, J.-Y. Chang, C.-L. Tai, D.-Y. Huang, H.-H. Hsieh, and Y.-W. Liu, “Bird sound classification using convolutional neural networks,” in CLEF (Working Notes), 2019. [13] E. Cakir, S. Adavanne, G. Parascandolo, K. Drossos, and T. Virtanen, “Convolutional recurrent neural networks for bird audio detection,” in 2017 25th European Signal Processing Conference (EUSIPCO), pp. 1744–1748, IEEE, 2017. [14] D. Stowell and D. Clayton, “Acoustic event detection for multiple overlapping similar sources,” in 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 1–5, IEEE, 2015. [15] Y. R. Pandeya, B. Bhattarai, and J. Lee, “Visual object detector for cow sound event detection,” IEEE Access, vol. 8, pp. 162625–162633, 2020. [16] M. Crous, Polyphonic Bird Sound Event Detection With Convolutional Recurrent Neural Networks. PhD thesis, 07 2019. [17] A. G. A. Parrilla and D. Stowell, “Polyphonic sound event detection for highly dense birdsong scenes,” 2022. [18] L. M. Chronister, T. A. Rhinehart, A. Place, and J. Kitzes, An Annotated Set of Audio Recordings of Eastern North American Birds Containing Frequency, Time, and Species Information. Wiley Online Library, 2021. [19] C.-Y. Koh, Y.-S. Chen, Y.-W. Liu, and M. R. Bai, “Sound event detection by consistency training and pseudo-labeling with feature-pyramid convolutional recurrent neural networks,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 376–380, 2021. [20] F.-C. Chen, K.-D. Chen, and Y.-W. Liu, “Domestic sound event detection by shift consistency mean-teacher training and adversarial domain adaptation,” in Proc. Int. Congress on Acoustics, Gyeongju, Korea, 2022. [21] M. Huzaifah, “Comparison of time-frequency representations for environmental sound classification using convolutional neural networks,” arXiv preprint arXiv:1706.07156, 2017. [22] H. Fayek, “Speech processing for machine learning: Filter banks, mel-frequency cepstral coefficients (mfccs) and what’s in-between,” 2016. https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html. [23] B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298–2304, 2016. [24] Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, “Language modeling with gated convolutional networks,” in International Conference on Machine Learning, pp. 933–941, PMLR, 2017. [25] M. Schuster and K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997. [26] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125, 2017. [27] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176, 2017. [28] E. Cakir, T. Heittola, H. Huttunen, and T. Virtanen, “Polyphonic sound event detection using multi label deep neural networks,” in 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–7, 2015. [29] V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, and J. P. Bello, “Birdvox-full-night: A dataset and benchmark for avian flight call detection,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 266–270, 2018. [30] S. Kahl, T. Wilhelm-Stein, H. Klinck, D. Kowerko, and M. Eibl, “Recognizing birds from sound-the 2018 birdclef baseline system,” arXiv preprint arXiv:1804.07177, 2018. [31] F. J. Bravo Sanchez, M. R. Hossain, N. B. English, and S. T. Moore, “Bioacoustic classification of avian calls from raw sound waveforms with an open-source deep learning architecture,” Scientific Reports, vol. 11, no. 1, pp. 1–12, 2021. [32] J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello, “Scaper: A library for soundscape synthesis and augmentation,” in 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 344–348, 2017. [33] P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He, “Accurate, large minibatch sgd: Training imagenet in 1 hour,” arXiv preprint arXiv:1706.02677, 2017. [34] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015. [35] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987. [36] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.,” Journal of Machine Learning Research, vol. 9, no. 11, 2008. [37] M. Long, Z. Cao, J. Wang, and M. I. Jordan, “Conditional adversarial domain adaptation,” Advances in Neural Information Processing Systems, vol. 31, 2018. [38] C. K. Catchpole and P. J. B. Slater, Bird song: biological themes and variations (2nd ed.). Cambridge University Press, 2008. |