|
[1] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004. [2] A. Oliva and A. Torralba, “Modeling the shape of the scene: A holistic representation of the spatial envelope,” International journal of computer vision, vol. 42, no. 3, pp. 145–175, 2001. [3] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” 2005. [4] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Advances in neural information processing systems, 2014, pp. 2366–2374. [5] D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2650– 2658. [6] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,” in 2016 Fourth international conference on 3D vision (3DV). IEEE, 2016, pp. 239–248. [7] F. Liu, C. Shen, G. Lin, and I. Reid, “Learning depth from single monocular images using deep convolutional neural fields,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 10, pp. 2024–2039, 2016. [8] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241. [9] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in European Conference on Computer Vision. Springer, 2012, pp. 746–760. [10] A. Saxena, M. Sun, and A. Y. Ng, “Make3d: Learning 3d scene structure from a single still image,” IEEE transactions on pattern analysis and machine intelligence, vol. 31, no. 5, pp. 824–840, 2008. [11] L. He, G. Wang, and Z. Hu, “Learning depth from single images with deep neural network embedding focal length,” IEEE Transactions on Image Processing, vol. 27, no. 9, pp. 4676–4689, 2018. [12] H. Zhang, H. Han, J. Cui, S. Shan, and X. Chen, “Rgb-d face recognition via deep complementary and common feature learning,” in 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 2018, pp. 8–15. [13] Y. Cao, C. Shen, and H. T. Shen, “Exploiting depth from single monocular images for object detection and semantic segmentation,” IEEE Transactions on Image Processing, vol. 26, no. 2, pp. 836–846, 2016. [14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105. [15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [16] A. Dosovitskiy, J. Tobias Springenberg, and T. Brox, “Learning to generate chairs with convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1538–1546. [17] Q. Gao, J. Liu, Z. Ju, Y. Li, T. Zhang, and L. Zhang, “Static hand gesture recognition with parallel cnns for space human-robot interaction,” in International Conference on Intelligent Robotics and Applications. Springer, 2017, pp. 462–473. [18] J. Cui, H. Zhang, H. Han, S. Shan, and X. Chen, “Improving 2d face recognition via discriminative face depth estimation,” in 2018 International Conference on Biometrics (ICB). IEEE, 2018, pp. 140–147. [19] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using depth cameras for dense 3d modeling of indoor environments,” in Experimental robotics. Springer, 2014, pp. 477–491. [20] A. Zeng, S. Song, M. Nießner, M. Fisher, J. Xiao, and T. Funkhouser, “3dmatch: Learning local geometric descriptors from rgb-d reconstructions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1802–1811. [21] S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, “Learning rich features from rgb-d images for object detection and segmentation,” in European conference on computer vision. Springer, 2014, pp. 345–360. [22] S. Gupta, P. Arbeláez, R. Girshick, and J. Malik, “Indoor scene understanding with rgb-d images: Bottom-up segmentation, object detection and semantic segmentation,” International Journal of Computer Vision, vol. 112, no. 2, pp. 133–149, 2015. [23] X. Ren, L. Bo, and D. Fox, “Rgb-(d) scene labeling: Features and algorithms,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 2759–2766. [24] A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Burgard, “Multimodal deep learning for robust rgb-d object recognition,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2015, pp. 681–687. [25] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv: 1502.03167, 2015. [26] D. Hoiem, A. A. Efros, and M. Hebert, “Automatic photo pop-up,” in ACM transactions on graphics (TOG), vol. 24, no. 3. ACM, 2005, pp. 577–584. [27] A. Saxena, S. H. Chung, and A. Y. Ng, “3-d depth reconstruction from a single still image,” International journal of computer vision, vol. 76, no. 1, pp. 53–69, 2008. [28] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587. [29] R. Girshick, “Fast r-cnn,” in International Conference on Computer Vision (ICCV), 2015. [30] G. Wang, H.-T. Tsui, Z. Hu, and F. Wu, “Camera calibration and 3d reconstruction from a single view based on scene constraints,” Image and Vision Computing, vol. 23, no. 3, pp. 311–323, 2005. [31] A. Saxena, S. H. Chung, and A. Y. Ng, “Learning depth from single monocular images,” in Advances in neural information processing systems, 2006, pp. 1161–1168. [32] B. Liu, S. Gould, and D. Koller, “Single image depth estimation from predicted semantic labels,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010, pp. 1253–1260. [33] K. Karsch, C. Liu, and S. Kang, “Depth extraction from video using nonparametric sampling-supplemental material,” in European conference on Computer Vision. Citeseer, 2012. [34] C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 5, pp. 978–994, 2010. [35] M. Liu, M. Salzmann, and X. He, “Discrete-continuous depth estimation from a single image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 716–723. [36] B. Li, C. Shen, Y. Dai, A. Van Den Hengel, and M. He, “Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1119–1127. [37] P. Wang, X. Shen, Z. Lin, S. Cohen, B. Price, and A. L. Yuille, “Towards unified depth and semantic prediction from a single image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2800–2809. [38] A. Roy and S. Todorovic, “Monocular depth estimation using neural regression forest,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5506–5514. [39] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [40] J. Li, R. Klein, and A. Yao, “A two-streamed network for estimating finescaled depth maps from single rgb images,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3372–3380. [41] D. Xu, W. Ouyang, X. Wang, and N. Sebe, “Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 675–684. [42] L. He, M. Yu, and G. Wang, “Spindle-net: Cnns for monocular depth inference with dilation kernel method,” in 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018, pp. 2504–2509. [43] Z. Hao, Y. Li, S. You, and F. Lu, “Detail preserving depth estimation from a single image using attention guided networks,” in 2018 International Conference on 3D Vision (3DV). IEEE, 2018, pp. 304–313. [44] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015. [45] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015. [46] A. B. Owen, “A robust hybrid of lasso and ridge regression,” Contemporary Mathematics, vol. 443, no. 7, pp. 59–72, 2007. [47] L. Zwald and S. Lambert-Lacroix, “The berhu penalty and the grouped effect,” arXiv preprint arXiv:1207.6868, 2012. [48] S. S. Girija, “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” Software available from tensorflow. org, 2016. [49] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014. [50] F. Liu, C. Shen, and G. Lin, “Deep convolutional neural fields for depth estimation from a single image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5162–5170. |