|
[1] C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman. Patchmatch: a randomized correspondence algorithm for structural image editing. SIGGRAPH, ACM Transactions on Graphics, 28(3):24:1–24:11, 2009. [2] P. W. Battaglia, R. Pascanu, M. Lai, D. J. Rezende, and K. Kavukcuoglu. Interaction networks for learning about objects, relations and physics. In Neural Information Processing Systems (NIPS), pages 4502–4510, 2016. [3] A. Buades, B. Coll, and J. Morel. A non-local algorithm for image denoising. In Computer Vision and Pattern Recognition (CVPR), pages 60–65, 2005. [4] H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with bm3d? In Computer Vision and Pattern Recognition (CVPR), pages 2392–2399, 2012. [5] H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising with multi-layer perceptrons, part 2: training trade-offs and analysis of their mechanisms. CoRR, abs/1211.1552, 2012. [6] P. Carbonetto, N. de Freitas, and K. Barnard. A statistical model for general contextual object recognition. In European Conference on Computer Vision (ECCV), pages 350–362, 2004. [7] S. Chandra, N. Usunier, and I. Kokkinos. Dense and low-rank gaussian crfs using deep embeddings. In International Conference on Computer Vision (ICCV), pages 5113–5122, 2017. [8] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected crfs. CoRR, abs/1412.7062, 2014. [9] N. Chen, Q. Zhou, and V. K. Prasanna. Understanding web images by object relation network. In World Wide Web Conference (WWW), pages 291–300, 2012. [10] M. J. Choi, A. Torralba, and A. S. Willsky. A tree-based context model for object recognition. TPAMI, 34(2):240–252, 2012. [11] K. Dabov, A. Foi, V. Katkovnik, and K. O. Egiazarian. Image denoising by sparse 3-d transform-domain collaborative filtering. Transactions on Image Processing (TIP), 16(8):2080–2095, 2007. [12] J. Dai, Y. Li, K. He, and J. Sun. R-FCN: object detection via region-based fully convolutional networks. In Neural Information Processing Systems (NIPS), pages 379–387, 2016. [13] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. Deformable convolutional networks. In International Conference on Computer Vision (ICCV), pages 764–773, 2017. [14] S. K. Divvala, D. Hoiem, J. Hays, A. A. Efros, and M. Hebert. An empirical study of context in object detection. In Computer Vision and Pattern Recognition (CVPR), pages 1271–1278, 2009. [15] A. A. Efros and T. K. Leung. Texture synthesis by non-parametric sampling. In International Conference on Computer Vision (ICCV), pages 1033–1038, 1999. [16] Facebook Research. Caffe2: A new lightweight, modular, and scalable deep learning framework. https://caffe2.ai, 2017. [17] P. F. Felzenszwalb, R. B. Girshick, D. A. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. TPAMI, 32(9):1627– 1645, 2010. [18] C. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg. DSSD : Deconvolutional single shot detector. CoRR, abs/1701.06659, 2017. [19] C. Galleguillos, A. Rabinovich, and S. J. Belongie. Object categorization using co-occurrence, location and appearance. In Computer Vision and Pattern Recognition (CVPR), 2008. [20] J. Gehring, M. Auli, D. Grangier, and Y. Dauphin. A convolutional encoder model for neural machine translation. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 123–135, 2017. [21] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin. Convolutional sequence to sequence learning. In International Conference on Machine Learning (ICML), pages 1243–1252, 2017. [22] R. Girshick, I. Radosavovic, G. Gkioxari, P. Dollár, and K. He. Detectron. https://github.com/facebookresearch/detectron, 2018. [23] R. B. Girshick. Fast R-CNN. In International Conference on Computer Vision ICCV, pages 1440–1448, 2015. [24] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), pages 580–587, 2014. [25] D. Glasner, S. Bagon, and M. Irani. Super-resolution from a single image. In International Conference on Computer Vision (ICCV), pages 349–356, 2009. [26] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick. Mask R-CNN. In International Conference on Computer Vision (ICCV), pages 2980–2988, 2017. [27] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In European Conference on Computer Vision (ECCV), pages 346–361, 2014. [28] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. [29] Y. Hoshen. VAIN: attentional multi-agent predictive modeling. In Neural Information Processing Systems (NIPS), pages 2698–2708, 2017. [30] P. Krähenbühl and V. Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. In Neural Information Processing Systems (NIPS), pages 109–117, 2011. [31] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L. Li, D. A. Shamma, M. S. Bernstein, and L. Fei-Fei. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision (IJCV), 123(1):32–73, 2017. [32] S. Kumar and M. Hebert. A hierarchical field framework for unified context based classification. In International Conference on Computer Vision (ICCV), pages 1284–1291, 2005. [33] J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning (ICML), pages 282–289, 2001. [34] S. Lefkimmiatis. Non-local color image denoising with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), pages 5882–5891, 2017. [35] Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, and J. Sun. Light-head R-CNN: in defense of two-stage object detector. CoRR, abs/1711.07264, 2017. [36] T. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie. Feature pyramid networks for object detection. In Computer Vision and Pattern Recognition (CVPR), pages 936–944, 2017. [37] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: common objects in context. In European Conference on Computer Vision (ECCV), pages 740–755, 2014. [38] S. Liu, S. D. Mello, J. Gu, G. Zhong, M. Yang, and J. Kautz. Learning affinity via spatial propagation networks. In Neural Information Processing Systems (NIPS), pages 1519–1529, 2017. [39] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C. Berg. SSD: single shot multibox detector. In European Conference on Computer Vision (ECCV), pages 21–37, 2016. [40] T. Malisiewicz and A. A. Efros. Beyond categories: The visual memex model for reasoning about object relationships. In Neural Information Processing Systems (NIPS), pages 1222–1230, 2009. [41] R. Mottaghi, X. Chen, X. Liu, N. Cho, S. Lee, S. Fidler, R. Urtasun, and A. L. Yuille. The role of context for object detection and semantic segmentation in the wild. In Computer Vision and Pattern Recognition (CVPR), pages 891–898, 2014. [42] V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning (ICML), pages 807–814, 2010. [43] A. Newell and J. Deng. Pixels to graphs by associative embedding. In Neural Information Processing Systems (NIPS), pages 2168–2177, 2017. [44] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. In Neural Information Processing Systems Workshop (NIPS-W), 2017. [45] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2016. [46] J. Redmon and A. Farhadi. YOLO9000: better, faster, stronger. In Computer Vision and Pattern Recognition (CVPR), pages 6517–6525, 2017. [47] J. Redmon and A. Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. [48] S. Ren, K. He, R. B. Girshick, and J. Sun. Faster R-CNN: towards real-time object detection with region proposal networks. In Neural Information Processing Systems (NIPS), pages 91–99, 2015. [49] A. Santoro, D. Raposo, D. G. T. Barrett, M. Malinowski, R. Pascanu, P. Battaglia, and T. Lillicrap. A simple neural network module for relational reasoning. In Neural Information Processing Systems (NIPS), pages 4974–4983, 2017. [50] S. M. Smith and J. M. Brady. SUSAN - A new approach to low level image processing. International Journal of Computer Vision (IJCV), 23(1):45–78, 1997. [51] C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. In International Conference on Computer Vision (ICCV), pages 839–846, 1998. [52] A. Torralba, K. P. Murphy, W. T. Freeman, and M. A. Rubin. Context-based vision system for place and object recognition. In International Conference on Computer Vision (ICCV), pages 273–280, 2003. [53] S.-Y. R. Tseng. Detectron.pytorch. https://github.com/roytseng-tw/Detectron.pytorch, 2018. [54] Z. Tu. Auto-context and its application to high-level vision tasks. In Computer Vision and Pattern (CVPR), 2008. [55] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In Neural Information Processing Systems (NIPS), pages 6000–6010, 2017. [56] X. Wang, R. Girshick, A. Gupta, and K. He. Non-local neural networks. In Computer Vision and Pattern Recognition (CVPR), 2018. [57] N. Watters, D. Zoran, T. Weber, P. Battaglia, R. Pascanu, and A. Tacchetti. Visual interaction networks: Learning a physics simulator from video. In Neural Information Processing Systems (NIPS), pages 4542–4550, 2017. [58] D. Xu, Y. Zhu, C. B. Choy, and L. Fei-Fei. Scene graph generation by iterative message passing. In Computer Vision and Pattern Recognition (CVPR), pages 3097–3106, 2017. [59] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. S. Torr. Conditional random fields as recurrent neural networks. In International Conference on Computer Vision (ICCV), pages 1529–1537, 2015. |