|
[1] Y.-L. Huang, B.-H. Su, Y.-W. P. Hong, and C.-C. Lee, "An Attention-Based Method for Guiding Attribute-Aligned Speech Representation Learning," Interspeech, 2022. [2] J. L. Kröger, O. H.-M. Lutz, and P. Raschke, "Privacy implications of voice and speech analysis–information disclosure by inference," in IFIP International Summer School on Privacy and Identity Management, 2019: Springer, pp. 242-258. [3] M. Jaiswal and E. M. Provost, "Privacy enhanced multimodal neural representations for emotion recognition," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 05, pp. 7985-7993. [4] M. Xia, A. Field, and Y. Tsvetkov, "Demoting Racial Bias in Hate Speech Detection," in Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media, 2020, pp. 7-14. [5] B. M. L. Srivastava, A. Bellet, M. Tommasi, and E. Vincent, "Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?," in INTERSPEECH 2019-20th Annual Conference of the International Speech Communication Association, 2019. [6] Y.-L. Huang, B.-H. Su, Y.-W. P. Hong, and C.-C. Lee, "An Attribute-Aligned Strategy for Learning Speech Representation," Interspeech, pp. 1179–1183, 2021. [7] J. Kim, M. El-Khamy, and J. Lee, "Transformer with gaussian weighted self-attention for speech enhancement," presented at the ICASSP, 2021. [8] A. Défossez, G. Synnaeve, and Y. Adi, "Real Time Speech Enhancement in the Waveform Domain," Proc. Interspeech 2020, pp. 3291-3295, 2020. [9] T.-A. Hsieh, C. Yu, S.-W. Fu, X. Lu, and Y. Tsao, "Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement," Interspeech, 2021. [10] S.-W. Fu et al., "MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement," Interspeech, 2021. [11] D. S. Park et al., "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition," Proc. Interspeech 2019, pp. 2613-2617, 2019. [12] Y. Ganin and V. Lempitsky, "Unsupervised domain adaptation by backpropagation," in International conference on machine learning, 2015: PMLR, pp. 1180-1189. [13] W. Wang, W. Wang, M. Sun, and C. Wang, "Acoustic Scene Analysis with Multi-Head Attention Networks," Proc. Interspeech 2020, pp. 1191-1195, 2020. [14] C. Busso et al., "IEMOCAP: Interactive emotional dyadic motion capture database," Language resources and evaluation, vol. 42, no. 4, pp. 335-359, 2008. [15] A. Mesaros, T. Heittola, and T. Virtanen, "A MULTI-DEVICE DATASET FOR URBAN ACOUSTIC SCENE CLASSIFICATION," in Scenes and Events 2018 Workshop (DCASE2018), p. 9. [16] R. Lotfian and C. Busso, "Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings," IEEE Transactions on Affective Computing, vol. 10, no. 4, pp. 471-483, 2017. [17] C. K. Reddy et al., "The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Speech Quality and Testing Framework." [18] C. Valentini, "Noisy speech database for training speech enhancement algorithms and TTS Models," University of Edinburgh. School of Informatics. Centre for Speech Research, 2016. [19] H. Hu et al., "A two-stage approach to device-robust acoustic scene classification," in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021: IEEE, pp. 845-849. [20] A. Baevski, S. Schneider, and M. Auli, "vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations," in International Conference on Learning Representations, 2019. [21] B.-H. Su, C.-M. Chang, Y.-S. Lin, and C.-C. Lee, "Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network," in INTERSPEECH, 2020, pp. 506-510. [22] B. Desplanques, J. Thienpondt, and K. Demuynck, "ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN based speaker verification," in Interspeech2020, 2020: International Speech Communication Association (ISCA), pp. 3830-3834. [23] H. Zhang et al., "PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit," Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, 2022. [24] A. Nagrani, J. S. Chung, and A. Zisserman, "VoxCeleb: A Large-Scale Speaker Identification Dataset," Proc. Interspeech 2017, pp. 2616-2620, 2017. [25] J. S. Chung, A. Nagrani, and A. Zisserman, "VoxCeleb2: Deep Speaker Recognition," Proc. Interspeech 2018, pp. 1086-1090, 2018. [26] C. Veaux, J. Yamagishi, and K. MacDonald, "CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit," University of Edinburgh. The Centre for Speech Technology Research (CSTR), 2017. [27] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "Librispeech: an asr corpus based on public domain audio books," in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015: IEEE, pp. 5206-5210. [28] A. Mesaros et al., "DCASE 2017 challenge setup: Tasks, datasets and baseline system," in DCASE 2017-Workshop on Detection and Classification of Acoustic Scenes and Events, 2017. [29] G. Dekkers et al., "The SINS database for detection of daily activities in a home environment using an acoustic sensor network," Detection and Classification of Acoustic Scenes and Events 2017, pp. 1-5, 2017. |