|
1. B. Patel and B. Meshram, "Content Based Video Retrieval Systems," International Journal of UbiComp, vol. 3, May 2012. 2. A. Ansari and M. H. Mohammed, "Content based video retrieval systems-methods, techniques, trends and challenges," International Journal of Computer Applications, vol. 112, 2015. 3. M. Soleymani, J. J. M. Kierkels, G. Chanel and T. Pun, "A bayesian framework for video affective representation," in 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, 2009. 4. A. Zlatintsi, P. Koutras, G. Evangelopoulos, N. Malandrakis, N. Efthymiou, K. Pastra, A. Potamianos and P. Maragos, "COGNIMUSE: A multimodal video database annotated with saliency, events, semantics and emotion with application to summarization," EURASIP Journal on Image and Video Processing, vol. 2017, p. 1–24, 2017. 5. A. S. Adly, M. S. Abdelwahab, I. Hegazy and T. Elarif, "Issues and Challenges for Content-Based Video Search Engines A Survey," in 2020 21st International Arab Conference on Information Technology (ACIT), 2020. 6. E. Katz, Ephraim Katz's The Film Encyclopedia, Thomas Y. Crowell, 1979. 7. J. E. Cutting, "Event segmentation and seven types of narrative discontinuity in popular movies," Acta psychologica, vol. 149, p. 69–77, 2014. 8. L. Chen and M. T. Ozsu, "Rule-based scene extraction from video," in Proceedings. International Conference on Image Processing, 2002. 9. V. T. Chasanis, A. C. Likas and N. P. Galatsanos, "Scene detection in videos using shot clustering and sequence alignment," IEEE transactions on multimedia, vol. 11, p. 89–100, 2008. 10. R. Panda, S. K. Kuanar and A. S. Chowdhury, "Nyström approximated temporally constrained multisimilarity spectral clustering approach for movie scene detection," IEEE transactions on cybernetics, vol. 48, p. 836–847, 2017. 11. D. Rotman, D. Porat, G. Ashour and U. Barzelay, "Optimally grouped deep features using normalized cost for video scene detection," in Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018. 12. P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, H. Meinedo, M. Bugalho and I. Trancoso, "Temporal video segmentation to scenes using high-level audiovisual features," IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, p. 1163–1177, 2011. 13. L. Baraldi, C. Grana and R. Cucchiara, "A deep siamese network for scene detection in broadcast videos," in Proceedings of the 23rd ACM international conference on Multimedia, 2015. 14. N. Nitanda, M. Haseyama and H. Kitajima, "Audio signal segmentation and classification for scene-cut detection," in 2005 IEEE International Symposium on Circuits and Systems, 2005. 15. Y. Zhu and D. Zhou, "Scene change detection based on audio and video content analysis," in Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003, 2003. 16. H. Sundaram and S.-F. Chang, "Video scene segmentation using video and audio features," in 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No. 00TH8532), 2000. 17. S. Rho and E. Hwang, "Video scene determination using audiovisual data analysis," in 24th International Conference on Distributed Computing Systems Workshops, 2004. Proceedings., 2004. 18. A. Rao, L. Xu, Y. Xiong, G. Xu, Q. Huang, B. Zhou and D. Lin, "A local-to-global approach to multi-modal movie scene segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 19. Z. Pan, Z. Luo, J. Yang and H. Li, "Multi-Modal Attention for Speech Emotion Recognition," in Proc. Interspeech 2020, 2020. 20. Q. Huang, Y. Xiong, A. Rao, J. Wang and D. Lin, "MovieNet: A Holistic Dataset for Movie Understanding," in Proceedings of the European Conference on Computer Vision (ECCV), 2020. 21. B. Zhou, A. Lapedriza, A. Khosla, A. Oliva and A. Torralba, "Places: A 10 million image database for scene recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 40, p. 1452–1464, 2017. 22. Q. Huang, Y. Xiong and D. Lin, "Unifying identification and context learning for person recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 23. N. Zhang, M. Paluri, Y. Taigman, R. Fergus and L. Bourdev, "Beyond frontal faces: Improving person recognition using multiple cues," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015. 24. C. Gu, C. Sun, D. A. Ross, C. Vondrick, C. Pantofaru, Y. Li, S. Vijayanarasimhan, G. Toderici, S. Ricco, R. Sukthankar and others, "Ava: A video dataset of spatio-temporally localized atomic visual actions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 25. S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold and others, "CNN architectures for large-scale audio classification," in 2017 ieee international conference on acoustics, speech and signal processing (icassp), 2017. 26. Q. Kong, Y. Cao, T. Iqbal, Y. Wang, W. Wang and M. D. Plumbley, "Panns: Large-scale pretrained audio neural networks for audio pattern recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, p. 2880–2894, 2020. 27. J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal and M. Ritter, "Audio set: An ontology and human-labeled dataset for audio events," in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017. 28. J. Cramer, H.-H. Wu, J. Salamon and J. P. Bello, "Look, listen, and learn more: Design choices for deep audio embeddings," in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. 29. R. Arandjelovic and A. Zisserman, "Look, listen and learn," in Proceedings of the IEEE International Conference on Computer Vision, 2017. 30. J. Pons, O. Nieto, M. Prockup, E. M. Schmidt, A. F. Ehmann and X. Serra, "End-to-end learning for music audio tagging at scale," in 19th International Society for Music Information Retrieval Conference (ISMIR2018), 2018. 31. J. Pons and X. Serra, "musicnn: pre-trained convolutional neural networks for music audio tagging," in Late-breaking/demo session in 20th International Society for Music Information Retrieval Conference (LBD-ISMIR2019), 2019. 32. E. Law, K. West, M. I. Mandel, M. Bay and J. S. Downie, "Evaluation of algorithms using games: The case of music tagging.," in ISMIR, 2009. 33. C. Li, X. Ma, B. Jiang, X. Li, X. Zhang, X. Liu, Y. Cao, A. Kannan and Z. Zhu, "Deep speaker: an end-to-end neural speaker embedding system," arXiv preprint arXiv:1705.02304, 2017. 34. V. Panayotov, G. Chen, D. Povey and S. Khudanpur, "Librispeech: An ASR corpus based on public domain audio books," in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015. 35. N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V. Le, G. E. Hinton and J. Dean, "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer," in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017. 36. D. Eigen, M. Ranzato and I. Sutskever, "Learning Factored Representations in a Deep Mixture of Experts," in 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Workshop Track Proceedings, 2014.
|