|
[1] R. Bernardi, R. Cakici, D. Elliott, A. Erdem, E. Erdem, N. Ikizler-Cinbis, F. Keller, A. Muscat, B. Plank, “Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures”, Journal of Artificial Intelligence Research (JAIR), pp. 409-442, 2016. [2] H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Doll ́ar, J. Gao, X. He, M. Mitchell, J. Platt, C. L. Zitnick and G. Zweig, “From captions to visual concepts and back.” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1473-1482, 2015. [3] G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, T. L. Berg, “Baby talk: Understanding and generating simple image descriptions.” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1601-1608, 2011. [4] S. Li, G. Kulkarni, T. L. Berg, A. C. Berg and Y. Choi, “Composing simple image descriptions using web-scale n-grams.” The SIGNLL Conf. on Computational Natural Language Learning (CoNLL), pp. 220-228, 2011. [5] I. Sutskever, J. Martens and G. E. Hinton, “Generating text with recurrent neural networks.” International Conf. on Machine Learning (ICML), pp. 1017-1024, 2011. [6] J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang and A. L. Yuille, “Explain Images with Multimodal Recurrent Neural Networks.” Advances in Neural Information Processing Systems Deep Learning Workshop (NIPS workshop), 2014. [7] I. Sutskever, O. Vinyals and Q. V. Le, “Sequence to sequence learning with neural networks.” Advances in Neural Information Processing Systems (NIPS), pp. 3104-3112, 2014. [8] R. Kiros, R. Salakhutdinov, R. S. Zemel, “Unifying visual-semantic embeddings with multimodal neural language models.” Advances in Neural Information Processing Systems Deep Learning Workshop (NIPS workshop), 2014. [9] A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for generating image descriptions.” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3128-3137, 2015. [10] J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description.” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2625-2634, 2015. [11] A. Karpathy, A. Joulin and L. Fei-Fei, “Deep Fragment Embeddings for Bidirectional Image Sentence Mapping.” Advances in Neural Information Processing Systems (NIPS), pp. 1889-1897, 2014. [12] O. Vinyals, A. Toshev, S. Bengio and D. Erhan, “Show and Tell: A Neural Image Caption Generator.” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3157-3164, 2015. [13] K. Simonyan, and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition.” Computing Research Repository (CoRR), abs/1409.1556, 2014. |