|
[1] D. Christian, M. Andreas, S. Sergey, N. Maria, F. Nicolaos, and B. Alxeander, “Monitoring activities of daily living in smart homes: Understanding human behavior,” IEEE Signal Processing Magazine, vol. 33, no. 2, pp. 81–94, 2016. [2] Z. Yaniv, L. Dima, and G. Israel, “A method for automatic fall detection of elderly people using floor vibrations and sound—proof of concept on human mimicking doll falls,” IEEE Transactions on Biomedical Engineering, vol. 56, no. 12, pp. 2858–2867, 2009. [3] R. Regunathan, D. Ajay, and S. Paris, “Audio analysis for surveillance applications,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 158–161, 2005. [4] Q. Jin, P. Schulam, S. Rawat, S. Burger, D. Ding, and F. Metze, “Event-based video retrieval using audio,” in Thirteenth Annual Conference of the International Speech Communication Association, 2012. [5] N. Takahashi, M. Gygli, B. Pfister, and L. V. Gool, “Deep convolutional neural networks and data augmentation for acoustic event recognition,” in International Speech Communication Association (INTERSPEECH), pp. 2982–2986, 2016. [6] T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, and K. Takeda, “Duration-controlled LSTM for polyphonic sound event detection, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 11, pp. 2059–2070, 2017. [7] E. Cakır, G. Parascandolo, T. Heittola, H. Huttunen, and T. Virtanen, “Convolutional recurrent neural networks for polyphonic sound event detection,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, pp. 1291–1303, 2017. [8] A. Kumar and B. Raj, “Audio event detection using weakly labeled data,” in Proceedings of the 24th ACM international conference on Multimedia, pp. 1038–1047, 2016. [9] Y. Wang, J. Li, and F. Metze, “A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 31–35, 2019. [10] Y. Xu, Q. Kong, W. Wang, and M. D. Plumbley, “Large-scale weakly supervised audio classification using gated convolutional neural network,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 121–125, 2018. [11] L. Lin, X. Wang, H. Liu, and Y. Qian, “Guided learning for weakly-labeled semi-supervised sound event detection,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 626–630, 2020. [12] R. Serizel, N. Turpault, H. Eghbal-Zadeh, and A. P. Shah, “Large-scale weakly labeled semi-supervised sound event detection in domestic environments,” in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), pp. 19–23, November 2018. [13] A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), pp. 1195–1204, 2017. [14] V. Verma, A. Lamb, J. Kannala, Y. Bengio, and D. Lopez-Paz, “Interpolation consistency training for semi-supervised learning,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pp. 3635–3641, 7 2019. [15] W. Wei, H. Zhu, E. Benetos, and Y. Wang, “A-crnn: A domain adaptation model for sound event detection,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 276–280, 2020. [16] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7167–7176, 2017. [17] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” Journal of Machine Learning Research, vol. 17, no. 1, pp. 2096–2030, 2016. [18] H. Park, S. Yun, J. Eum, J. Cho, and K. Hwang, “Weakly labeled sound event detection using tri-training and adversarial learning,” in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pp. 184–188, 2019. [19] L. Yang, J. Hao, Z. Hou, and W. Peng, “Two-stage domain adaptation for sound event detection,” in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020), pp. 230–234, 2020. [20] M. Huzaifah, “Comparison of time-frequency representations for environmental sound classification using convolutional neural networks,” arXiv preprint arXiv:1706.07156, 2017. [21] H. M. Fayek, “Speech processing for machine learning: Filter banks, mel-frequency cepstral coefficients (mfccs) and what’s inbetween,” 2016. [22] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014. [23] Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, “Language modeling with gated convolutional networks,” in Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 933–941, 2017. [24] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125, 2017. [25] Y. Jie, S. Yan, G. Wu, D. LiRong, M. Ian, and C. Liang, “A region based attention method for weakly supervised sound event detection and classification,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 755–759, 2019. [26] Z. Shi, L. Liu, H. Lin, R. Liu, and A. Shi, “Hodgepodge: Sound event detection based on ensemble of semi-supervised learning methods,” arXiv preprint arXiv:1907.07398, 2019. [27] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in International Conference on Learning Representations (ICLR), 2018. [28] R. Serizel, N. Turpault, A. Shah, and J. Salamon, “Sound event detection in synthetic domestic environments,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 86–90, 2020. [29] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proceedings of the International Conference on Machine Learning (ICML), vol. 30, p. 3, 2013. [30] E. Cakir, T. Heittola, H. Huttunen, and T. Virtanen, “Polyphonic sound event detection using multi label deep neural networks,” in 2015 international joint conference on neural networks (IJCNN), pp. 1–7, 2015. [31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016. [32] A. Mesaros, T. Heittola, and T. Virtanen, “Metrics for polyphonic sound event detection,” Applied Sciences, vol. 6, no. 6, p. 162, 2016. [33] Ç. Bilen, G. Ferroni, F. Tuveri, J. Azcarreta, and S. Krstulović, “A framework for the robust evaluation of sound event detection,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 61–65, 2020. [34] L. Van der Maaten and G. Hinton, “Visualizing data using tsne,” Journal of Machine Learning Research, vol. 9, no. 11, 2008. [35] Y. Wu, D. Inkpen, and A. ElRoby, “Dual mixup regularized learning for adversarial domain adaptation,” in European Conference on Computer Vision (ECCV), pp. 540–555, Springer, 2020. [36] W. Lin, M.M. Mak, N. Li, D. Su, and D. Yu, “Multilevel deep neural network adaptation for speaker verification using mmd and consistency regularization,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6839–6843, IEEE, 2020.
|