|
[1] M. Cheung, J. J. Campbell, L. Whitby, R. J. Thomas, J. Braybrook, and J. Petzing, “Current trends in flow cytometry automated data analysis software,” Cytometry Part A, 2021. [2] J. Flores-Montero, G. Grigore, R. Fluxá, J. Hernández, P. Fernandez, J. Almeida, N. Muñoz, S. Böttcher, L. Sedek, V. van der Velden, et al., “Euroflow lymphoid screening tube (lst) data base for automated identification of blood lymphocyte subsets,” Journal of immunological methods, vol. 475, p. 112662, 2019. [3] E. Linskens, A. M. Diks, J. Neirinck, M. Perez-Andres, E. De Maertelaere, M. A. Berkowska, T. Kerre, M. Hofmans, A. Orfao, J. J. Van Dongen, et al., “Improved standardization of flow cytometry diagnostic screening of primary immunodeficiency by software-based automated gating,” Frontiers in immunology, vol. 11, 2020. [4] M. C. Béné, F. Lacombe, and A. Porwit, “Unsupervised flow cytometry analysis in hematological malignancies: A new paradigm,” International Journal of Laboratory Hematology, vol. 43, pp. 54–64, 2021. [5] S. A. Monaghan, J.-L. Li, Y.-C. Liu, M.-Y. Ko, M. Boyiadzis, T.-Y. Chang, Y.-F. Wang, C.-C. Lee, S. H. Swerdlow, and B.-S. Ko, “A machine learning approach to the classification of acute leukemias and distinction from nonneoplastic cytopenias using flow cytometry data,” American Journal of Clinical Pathology, 2021. [6] M. Zhao, N. Mallesh, A. Höllein, R. Schabath, C. Haferlach, T. Haferlach, F. Elsner, H. Lüling, P. Krawitz, and W. Kern, “Hematologist-level classification of mature b-cell neoplasm using deep learning on multiparameter flow cytometry data,” Cytometry Part A, vol. 97, no. 10, pp. 1073–1080, 2020. [7] R. W. Picard, Affective computing. MIT press, 2000. [8] S. Narayanan and P. G. Georgiou, “Behavioral signal processing: Deriving human behavioral informatics from speech and language,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1203–1233, 2013. [9] P. Ekman, E. R. Sorenson, and W. V. Friesen, “Pan-cultural elements in facial displays of emotion,” Science, vol. 164, no. 3875, pp. 86–88, 1969. [10] A. Bagher Zadeh, P. P. Liang, S. Poria, E. Cambria, and L.-P. Morency, “Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (Melbourne, Australia), pp. 2236–2246, Association for Computational Linguistics, July 2018. [11] C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan, “Iemocap: Interactive emotional dyadic motion capture database,” Language resources and evaluation, vol. 42, no. 4, p. 335, 2008. [12] S.-L. Yeh, Y.-S. Lin, and C.-C. Lee, “A dialogical emotion decoder for speech emotion recognition in spoken dialog,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6479–6483, 2020. [13] J.-L. Li and C.-C. Lee, “Attention learning with retrievable acoustic embedding of personality for emotion recognition,” in 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 171–177, 2019. [14] S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea, “Meld: A multimodal multi-party dataset for emotion recognition in conversations,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 527–536, 2019. [15] M. Reiter, M. Diem, A. Schumich, M. Maurer-Granofszky, L. Karawajew, J. G. Rossi, R. Ratei, S. Groeneveld-Krentz, E. O. Sajaroff, S. Suhendra, et al., “Automated flow cy-tometric mrd assessment in childhood acute b-lymphoblastic leukemia using supervised machine learning,” Cytometry Part A, 2019. [16] I. Della Starza, S. Chiaretti, M. S. De Propris, L. Elia, M. Cavalli, L. A. De Novi, R. Soscia, M. Messina, A. Vitale, A. R. Guarini, et al., “Minimal residual disease in acute lymphoblastic leukemia: technical and clinical advances,” Frontiers in oncology, vol. 9, p. 726, 2019. [17] B.-S. Ko, Y.-F. Wang, J.-L. Li, C.-C. Li, P.-F. Weng, S.-C. Hsu, H.-A. Hou, H.-H. Huang, M. Yao, C.-T. Lin, et al., “Clinically validated machine learning algorithm for detecting residual diseases with multicolor flow cytometry analysis in acute myeloid leukemia and myelodysplastic syndrome,” EBioMedicine, vol. 37, pp. 91–100, 2018. [18] J. Li, Y. Wang, B. Ko, C. Li, J. Tang, and C. Lee, “Learning a cytometric deep phenotype embedding for automatic hematological malignancies classification,” in 2019 41st An-nual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 1733–1736, July 2019. [19] P. Rota, S. Groeneveld-Krentz, and M. Reiter, “On automated flow cytometric analysis for mrd estimation of acute lymphoblastic leukaemia: A comparison among different approaches,” in 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 438–441, Nov 2015. [20] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015. [21] S. Wu, J. Li, C. Liu, Z. Yu, and H.-S. Wong, “Mutual learning of complementary networks via residual correction for improving semi-supervised classification,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. [22] D. Hui, N. Didwaniya, M. Vidal, S. H. Shin, G. Chisholm, J. Roquemore, and E. Bruera,“Quality of end-of-life care in patients with hematologic malignancies: A retrospective cohort study,” Cancer, vol. 120, no. 10, pp. 1572–1578, 2014. [23] E. Campo, S. H. Swerdlow, N. L. Harris, S. Pileri, H. Stein, and E. S. Jaffe, “The 2008 who classification of lymphoid neoplasms and beyond: evolving concepts and practical applications,” Blood, The Journal of the American Society of Hematology, vol. 117, no. 19, pp. 5019–5032, 2011. [24] H. Rafei, H. M. Kantarjian, and E. J. Jabbour, “Recent advances in the treatment of acute lymphoblastic leukemia,” Leukemia & lymphoma, vol. 60, no. 11, pp. 2606–2621, 2019. [25] H. Kantarjian, T. Kadia, C. DiNardo, N. Daver, G. Borthakur, E. Jabbour, G. Garcia-Manero, M. Konopleva, and F. Ravandi, “Acute myeloid leukemia: Current progress and future directions,” Blood cancer journal, vol. 11, no. 2, pp. 1–25, 2021. [26] J. J. Jimenez, R. S. Chale, A. C. Abad, and A. V. Schally, “Acute promyelocytic leukemia (apl): a review of the literature,” Oncotarget, vol. 11, no. 11, p. 992, 2020. [27] C. Duetz, C. Bachas, T. M. Westers, and A. A. van de Loosdrecht, “Computational analysis of flow cytometry data in hematological malignancies: future clinical practice?,” Current Opinion in Oncology, vol. 32, no. 2, pp. 162–169, 2020. [28] S. Toghi Eshghi, A. Au-Yeung, C. Takahashi, C. R. Bolen, M. N. Nyachienga, S. P. Lear, C. Green, W. R. Mathews, and W. E. O’Gorman, “Quantitative comparison of conventional and t-sne-guided gating analyses,” Frontiers in immunology, vol. 10, p. 1194, 2019. [29] J. P. Vial, N. Lechevalier, F. Lacombe, P.-Y. Dumas, A. Bidet, T. Leguay, F. Vergez, A. Pigneux, and M. C. Béné, “Unsupervised flow cytometry analysis allows for an accurate identification of minimal residual disease assessment in acute myeloid leukemia,” Cancers, vol. 13, no. 4, p. 629, 2021. [30] D. P. Ng and L. M. Zuromski, “Augmented human intelligence and automated diagnosis in flow cytometry for hematologic malignancies,” American Journal of Clinical Pathology, vol. 155, no. 4, pp. 597–605, 2021. [31] A. C. Belkina, C. O. Ciccolella, R. Anno, R. Halpert, J. Spidlen, and J. E. Snyder-Cappione, “Automated optimized parameters for t-distributed stochastic neighbor embedding improve visualization and analysis of large datasets,” Nature communications, vol. 10, no. 1, pp. 1–12, 2019. [32] P. Rota, S. Groeneveld-Krentz, and M. Reiter, “On automated flow cytometric analysis for mrd estimation of acute lymphoblastic leukaemia: A comparison among different approaches,” in 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 438–441, IEEE, 2015. [33] R. Licandro, P. Rota, M. Reiter, and M. Kampel, “Flow cytometry based automatic mrd assessment in acute lymphoblastic leukaemia: Longitudinal evaluation of time-specific cell population models,” in 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6, IEEE, 2016. [34] R. Licandro, P. Rota, M. Reiter, and M. Kampel, “Flow cytometry based automatic mrd assessment in acute lymphoblastic leukaemia: Longitudinal evaluation of time-specific cell population models,” in 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6, 2016. [35] L. Weijler, M. Diem, M. Reiter, and M. Maurer-Granofszky, “Detecting rare cell populations in flow cytometry data using umap,” in 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4903–4909, IEEE, 2021. [36] B. Rajwa, P. K. Wallace, E. A. Griffiths, and M. Dundar, “Automated assessment of disease progression in acute myeloid leukemia by probabilistic analysis of flow cytometry data,” IEEE Transactions on Biomedical Engineering, vol. 64, no. 5, pp. 1089–1098, 2016. [37] Y.-F. Wang, B.-S. Ko, C.-C. Li, J.-L. Li, P.-F. Weng, H.-H. Huang, H.-A. Hou, H.-F. Tien, C.-C. Lee, and J.-L. Tang, “An artificial intelligence approach for b lymphoblastic leukemia minimal residual disease detection and clinical prognosis prediction using flow cytometry data,” Blood, vol. 130, no. Supplement 1, pp. 3980–3980, 2017. [38] J.-L. Li, T.-Y. Chang, Y.-F. Wang, B.-S. Ko, J.-L. Tang, and C.-C. Lee, “A knowledge-reserved distillation with complementary transfer for automated fc-based classification across hematological malignancies,” in 2020 42nd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC), pp. 5482–5485, 2020. [39] A. Miech, I. Laptev, and J. Sivic, “Learnable pooling with context gating for video classification,” arXiv preprint arXiv:1706.06905, 2017. [40] R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5297–5307, 2016. [41] B. D. Hedley, M. Keeney, J. Popma, and I. Chin-Yee, “Novel lymphocyte screening tube using dried monoclonal antibody reagents,” Cytometry Part B: Clinical Cytometry, vol. 88, no. 6, pp. 361–370, 2015. [42] A. Rajab, O. Axler, J. Leung, M. Wozniak, and A. Porwit, “Ten-color 15-antibody flow cytometry panel for immunophenotyping of lymphocyte population,” International journal of laboratory hematology, vol. 39, pp. 76–85, 2017. [43] C. E. Pedreira, E. S. Costa, Q. Lecrevisse, J. J. van Dongen, A. Orfao, E. Consortium, et al., “Overview of clinical flow cytometry data analysis: recent advances and future challenges,” Trends in biotechnology, vol. 31, no. 7, pp. 415–425, 2013. [44] P. Qiu, “Computational prediction of manually gated rare cells in flow cytometry data,” Cytometry Part A, vol. 87, no. 7, pp. 594–602, 2015. [45] P. Tang, X. Wang, B. Shi, X. Bai, W. Liu, and Z. Tu, “Deep fishernet for object classification,” arXiv preprint arXiv:1608.00182, 2016. [46] Y. Zhang, R. Jin, and Z.-H. Zhou, “Understanding bag-of-words model: a statistical framework,” International Journal of Machine Learning and Cybernetics, vol. 1, no. 1-4, pp. 43–52, 2010. [47] Y. Cao, T. A. Geddes, J. Y. H. Yang, and P. Yang, “Ensemble deep learning in bioinformatics,” Nature Machine Intelligence, vol. 2, no. 9, pp. 500–508, 2020. [48] M. Biehl, K. Bunte, and P. Schneider, “Analysis of flow cytometry data by matrix relevance learning vector quantization,” PLoS One, vol. 8, no. 3, p. e59401, 2013. [49] J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, “Image classification with the fisher vector: Theory and practice,” International journal of computer vision, vol. 105, no. 3, pp. 222–245, 2013. [50] H. Jégou, M. Douze, C. Schmid, and P. Pérez, “Aggregating local descriptors into a compact image representation,” in 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 3304–3311, IEEE, 2010. [51] X. Xie, M. Liu, Y. Zhang, B. Wang, C. Zhu, C. Wang, Q. Li, Y. Huo, J. Guo, C. Xu, et al.,“Single-cell transcriptomic landscape of human blood cells,” National science review, vol. 8, no. 3, p. nwaa180, 2021. [52] S. J. Loughran, S. Haas, A. C. Wilkinson, A. M. Klein, and M. Brand, “Lineage commitment of hematopoietic stem cells and progenitors: insights from recent single cell and lineage tracing technologies,” Experimental hematology, vol. 88, pp. 1–6, 2020. [53] C. Galigalidou, L. Zaragoza-Infante, A. Iatrou, A. Chatzidimitriou, K. Stamatopoulos, and A. Agathangelidis, “Understanding monoclonal b cell lymphocytosis: An interplay of genetic and microenvironmental factors.,” Frontiers in Oncology, vol. 11, pp. 769612–769612, 2021. [54] H. Kaseb, M. A. Tariq, and G. Gupta, “Lymphoblastic lymphoma,” StatPearls [Internet], 2021. [55] J. Wang, M. Xue, R. Culhane, E. Diao, J. Ding, and V. Tarokh, “Speech emotion recognition with dual-sequence lstm architecture,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6474–6478, 2020. [56] S. Latif, R. Rana, S. Khalifa, R. Jurdak, and J. Epps, “Direct Modelling of Speech Emotion from Raw Speech,” in Proc. Interspeech 2019, pp. 3920–3924, 2019. [57] J.-L. Li, T.-Y. Huang, C.-M. Chang, and C.-C. Lee, “A waveform-feature dual branch acoustic embedding network for emotion recognition,” Frontiers in Computer Science, vol. 2, p. 13, 2020. [58] A. S. Cowen and D. Keltner, “Self-report captures 27 distinct categories of emotion bridged by continuous gradients,” Proceedings of the National Academy of Sciences, vol. 114, no. 38, pp. E7900–E7909, 2017. [59] J. M. Naik, “Speaker verification: A tutorial,” IEEE Communications Magazine, vol. 28, no. 1, pp. 42–48, 1990. [60] J. S. Chung, J. Huh, S. Mun, M. Lee, H.-S. Heo, S. Choe, C. Ham, S. Jung, B.-J. Lee, and I. Han, “In Defence of Metric Learning for Speaker Recognition,” in Proc. Interspeech 2020, pp. 2977–2981, 2020. [61] S. Poria, E. Cambria, D. Hazarika, N. Majumder, A. Zadeh, and L.-P. Morency, “Context-dependent sentiment analysis in user-generated videos,” in Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers), pp. 873–883, 2017. [62] Z. Pan, Z. Luo, J. Yang, and H. Li, “Multi-Modal Attention for Speech Emotion Recognition,” in Proc. Interspeech 2020, pp. 364–368, 2020. [63] J.-L. Li and C.-C. Lee, “Attentive to individual: A multimodal emotion recognition network with personalized attention profile.,” in Interspeech, pp. 211–215, 2019. [64] S. Bhosale, R. Chakraborty, and S. K. Kopparapu, “Deep encoded linguistic and acoustic cues for attention based end to end speech emotion recognition,” in ICASSP 2020 – 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7189–7193, 2020. [65] R. Cai, K. Guo, B. Xu, X. Yang, and Z. Zhang, “Meta Multi-Task Learning for Speech Emotion Recognition,” in Proc. Interspeech 2020, pp. 3336–3340, 2020. [66] Y. Ahn, S. J. Lee, and J. W. Shin, “Cross-corpus speech emotion recognition based on few-shot learning and domain adaptation,” IEEE Signal Processing Letters, vol. 28, pp. 1190–1194, 2021. [67] X. Xu, J. Deng, N. Cummins, Z. Zhang, L. Zhao, and B. W. Schuller, “Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition,” in Proc. Interspeech 2019, pp. 949–953, 2019. [68] M. A. Jalal, R. K. Moore, and T. Hain, “Spatio-temporal context modelling for speech emotion classification,” in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 853–859, 2019. [69] P. Barros, N. Churamani, E. Lakomkin, H. Siqueira, A. Sutherland, and S. Wermter, “The omg-emotion behavior dataset,” in 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7, 2018. [70] H. Li, M. Tu, J. Huang, S. Narayanan, and P. Georgiou, “Speaker-invariant affective representation learning via adversarial training,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7144–7148, 2020. [71] J.-L. Li and C.-C. Lee, “Encoding Individual Acoustic Features Using Dyad-Augmented Deep Variational Representations for Dialog-level Emotion Recognition,” in Proc. Interspeech 2018, pp. 3102–3106, 2018. [72] S.-L. Yeh, Y.-S. Lin, and C.-C. Lee, “An interaction-aware attention network for speech emotion recognition in spoken dialogs,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6685–6689, 2019. [73] M. Jaiswal, C.-P. Bara, Y. Luo, M. Burzo, R. Mihalcea, and E. M. Provost, “MuSE: a multimodal dataset of stressed emotion,” in Proceedings of the 12th Language Resources and Evaluation Conference, (Marseille, France), pp. 1499–1510, European Language Resources Association, May 2020. [74] K. Sridhar and C. Busso, “Modeling uncertainty in predicting emotional attributes from spontaneous speech,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8384–8388, 2020. [75] X. Ma, Z. Wu, J. Jia, M. Xu, H. Meng, and L. Cai, “Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space,” in Proc. Interspeech 2017, pp. 1238–1242, 2017. [76] Z. Huang, W. Xue, Q. Mao, and Y. Zhan, “Unsupervised domain adaptation for speech emotion recognition using pcanet,” Multimedia Tools Appl., vol. 76, p. 6785–6799, mar 2017. [77] A. Marczewski, A. Veloso, and N. Ziviani, “Learning transferable features for speech emotion recognition,” in Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Thematic Workshops ’17, (New York, NY, USA), p. 529–536, Association for Computing Machinery, 2017. [78] p. song, “Transfer linear subspace learning for cross-corpus speech emotion recognition,” IEEE Transactions on Affective Computing, vol. 10, no. 2, pp. 265–275, 2019. [79] J. Kim, G. Englebienne, K. P. Truong, and V. Evers, “Towards Speech Emotion Recognition “in the Wild"Using Aggregated Corpora and Deep Multi-Task Learning,” in Proc. Interspeech 2017, pp. 1113–1117, 2017. [80] S. Latif, R. Rana, S. Younis, J. Qadir, and J. Epps, “Transfer Learning for Improving Speech Emotion Classification Accuracy,” in Proc. Interspeech 2018, pp. 257–261, 2018. [81] M. Abdelwahab and C. Busso, “Domain adversarial for acoustic emotion recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 12, pp. 2423–2435, 2018. [82] Y. Gao, J. Liu, L. Wang, and J. Dang, “Domain-adversarial autoencoder with attention based feature level fusion for speech emotion recognition,” in ICASSP 2021 – 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6314–6318, 2021. [83] H. Zhou and K. Chen, “Transferable positive/negative speech emotion recognition via class-wise adversarial domain adaptation,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3732–3736, 2019. [84] Q. Mao, G. Xu, W. Xue, J. Gou, and Y. Zhan, “Learning emotion-discriminative and domain-invariant features for domain adaptation in speech emotion recognition,” Speech Communication, vol. 93, pp. 1–10, 2017. [85] A. Schmitt, S. Ultes, and W. Minker, “A parameterized and annotated spoken dialog corpus of the cmu let's go bus information system,” in Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pp. 3369–3373, 2012. [86] W. M. Kouw and M. Loog, “An introduction to domain adaptation and transfer learning,” arXiv preprint arXiv:1812.11806, 2018. [87] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010. [88] A. Naman and L. Mancini, “Fixed-maml for few shot classification in multilingual speech emotion recognition,” arXiv preprint arXiv:2101.01356, 2021. [89] Y. Liu, L. He, and J. Liu, “Large Margin Softmax Loss for Speaker Verification,” in Proc. Interspeech 2019, pp. 2873–2877, 2019. [90] X. Xiang, S. Wang, H. Huang, Y. Qian, and K. Yu, “Margin matters: Towards more discriminative deep neural network embeddings for speaker recognition,” in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1652–1656, 2019. [91] J. Wang, K.-C. Wang, M. T. Law, F. Rudzicz, and M. Brudno, “Centroid-based deep metric learning for speaker recognition,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3652–3656, 2019. [92] A. Baevski, S. Schneider, and M. Auli, “vq-wav2vec: Self-supervised learning of discrete speech representations,” in International Conference on Learning Representations, 2019. [93] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An asr corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210, 2015. [94] M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, and M. Auli,“fairseq: A fast, extensible toolkit for sequence modeling,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), (Minneapolis, Minnesota), pp. 48–53, Association for Computational Linguistics, June 2019. [95] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008, 2017. [96] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4685–4694, 2019. [97] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu, “Cosface: Large margin cosine loss for deep face recognition,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5265–5274, 2018. [98] B. Liu, Y. Cao, Y. Lin, Q. Li, Z. Zhang, M. Long, and H. Hu, “Negative margin matters: Understanding margin in few-shot classification,” in European Conference on Computer Vision, pp. 438–455, Springer, 2020. [99] C. Zhang, K. Koishida, and J. H. L. Hansen, “Text-independent speaker verification based on triplet convolutional neural network embeddings,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 9, pp. 1633–1644, 2018. [100] Z. Ren, Z. Chen, and S. Xu, “Triplet based embedding distance and similarity learning for text-independent speaker verification,” in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 558–562, 2019. [101] F. Eyben, M. Wöllmer, and B. Schuller, “Opensmile: the munich versatile and fast open-source audio feature extractor,” in Proceedings of the 18th ACM international conference on Multimedia, pp. 1459–1462, 2010. [102] J. Huang, J. Tao, B. Liu, and Z. Lian, “Learning Utterance-Level Representations with Label Smoothing for Speech Emotion Recognition,” in Proc. Interspeech 2020, pp. 4079–4083, 2020. [103] Y.-L. Huang, B.-H. Su, Y.-W. P. Hong, and C.-C. Lee, “An Attribute-Aligned Strategy for Learning Speech Representation,” in Proc. Interspeech 2021, pp. 1179–1183, 2021. [104] S. T. Rajamani, K. T. Rajamani, A. Mallol-Ragolta, S. Liu, and B. Schuller, “A novel attention-based gated recurrent unit and its efficacy in speech emotion recognition,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6294–6298, 2021. [105] W. Xie, A. Nagrani, J. S. Chung, and A. Zisserman, “Utterance-level aggregation for speaker recognition in the wild,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5791–5795, 2019. [106] A. Nagrani, J. S. Chung, W. Xie, and A. Zisserman, “Voxceleb: Large-scale speaker verification in the wild,” Computer Speech & Language, vol. 60, p. 101027, 2020. [107] T. G. Dietterich, “Approximate statistical tests for comparing supervised classification learning algorithms,” Neural computation, vol. 10, no. 7, pp. 1895–1923, 1998. [108] P. Kuppens, J. Stouten, and B. Mesquita, “Individual differences in emotion components and dynamics: Introduction to the special issue,” Cognition and Emotion, vol. 23, no. 7, pp. 1249–1258, 2009. [109] H. Cao, R. Verma, and A. Nenkova, “Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech,” Computer speech & language, vol. 29, no. 1, pp. 186–202, 2015. [110] C. Welch, V. Perez-Rosas, J. Kummerfeld, and R. Mihalcea, “Look who’s talking: Inferring speaker attributes from personal longitudinal dialog,” in Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2019. [111] K. R. Scherer, “Vocal communication of emotion: A review of research paradigms,” Speech communication, vol. 40, no. 1-2, pp. 227–256, 2003. [112] S. Rukavina, S. Gruss, H. Hoffmann, J.-W. Tan, S. Walter, and H. C. Traue, “Affective computing and the impact of gender and age,” PloS one, vol. 11, no. 3, 2016. [113] H. Sagha, J. Deng, and B. Schuller, “The effect of personality trait, age, and gender on the performance of automatic speech valence recognition,” in 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 86–91, IEEE, 2017. [114] L. Zhang, L. Wang, J. Dang, L. Guo, and Q. Yu, “Gender-aware cnn-blstm for speech emotion recognition,” in International Conference on Artificial Neural Networks, pp. 782–790, Springer, 2018. [115] J.-L. Li and C.-C. Lee, “Attention learning with retrievable acoustic embedding of personality for emotion recognition,” in 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 171–177, IEEE, 2019. [116] M. Sidorov, A. Schmitt, E. Semenkin, and W. Minker, “Could speaker, gender or age awareness be beneficial in speech-based emotion recognition?,” in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 61–68, 2016. [117] M. Sidorov, S. Ultes, and A. Schmitt, “Comparison of gender-and speaker-adaptive emotion recognition.,” in LREC, pp. 3476–3480, 2014. [118] M. Bancroft, R. Lotfian, J. Hansen, and C. Busso, “Exploring the intersection between speaker verification and emotion recognition,” in 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 337–342, IEEE, 2019. [119] J. Williams and S. King, “Disentangling style factors from speaker representations,” in Proc. Interspeech, vol. 2019, pp. 3945–3949, 2019. [120] R. Pappagari, T. Wang, J. Villalba, N. Chen, and N. Dehak, “x-vectors meet emotions: A study on dependencies between emotion and speaker recognition,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7169–7173, IEEE, 2020. [121] H. Li, M. Tu, J. Huang, S. Narayanan, and P. Georgiou, “Speaker-invariant affective representation learning via adversarial training,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7144–7148, IEEE, 2020. [122] A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E. Cambria, and L.-P. Morency, “Memory fusion network for multi-view sequential learning,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018. [123] A. Zadeh, P. P. Liang, S. Poria, P. Vij, E. Cambria, and L.-P. Morency, “Multi-attention recurrent network for human communication comprehension,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018. [124] G. Degottex, J. Kane, T. Drugman, T. Raitio, and S. Scherer, “Covarep—a collaborative voice analysis repository for speech technologies,” in 2014 ieee international conference on acoustics, speech and signal processing (icassp), pp. 960–964, IEEE, 2014. [125] T. Baltrušaitis, P. Robinson, and L.-P. Morency, “Openface: an open source facial behavior analysis toolkit,” in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–10, IEEE, 2016. [126] R. JeffreyPennington and C. Manning, “Glove: Global vectors for word representation,” in Conference on Empirical Methods in Natural Language Processing, Citeseer, 2014. [127] J. Yuan and M. Liberman, “Speaker identification on the scotus corpus,” Journal of the Acoustical Society of America, vol. 123, no. 5, p. 3878, 2008. [128] J. S. Chung, A. Nagrani, and A. Zisserman, “Voxceleb2: Deep speaker recognition,” Proc. Interspeech 2018, pp. 1086–1090, 2018. [129] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
|