|
[1] F.-E. Wang, Y.-H. Yeh, M. Sun, W.-C. Chiu, and Y.-H. Tsai, “Bifuse: Monocular 360 depth estimation via bi-projection fusion,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. [2] F.-E. Wang, Y.-H. Yeh, Y.-H. Tsai, W.-C. Chiu, and M. Sun, “Bifuse++: Self- supervised and efficient bi-projection fusion for 360◦ depth estimation,” IEEE Trans- actions on Pattern Analysis and Machine Intelligence (TPAMI), 2022. [3] F.-E. Wang, Y.-H. Yeh, M. Sun, W.-C. Chiu, and Y.-H. Tsai, “Led2-net: Monocular 360deg layout estimation via differentiable depth rendering,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [4] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,” in International Conference on 3D Vision (3DV), 2016. [5] F.-E.Wang,H.-N.Hu,H.-T.Cheng,J.-T.Lin,S.-T.Yang,M.-L.Shih,H.-K.Chu,and M. Sun, “Self-supervised learning of depth and camera motion from 360◦ videos,” in Asian Conference on Computer Vision (ACCV), 2018. [6] H.-T. Cheng, C.-H. Chao, J.-D. Dong, H.-K. Wen, T.-L. Liu, and M. Sun, “Cube padding for weakly-supervised saliency prediction in 360◦ videos,” in IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2018. [7] S.-T. Yang, F.-E. Wang, C.-H. Peng, P. Wonka, M. Sun, and H.-K. Chu, “Dula-net: A dual-projection network for estimating room layouts from a single rgb panorama,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [8] A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matterport3D: Learning from RGB-D data in indoor envi- ronments,” International Conference on 3D Vision (3DV), 2017. [9] N. Zioulis, A. Karakottas, D. Zarpalas, and P. Daras, “Omnidepth: Dense depth estimation for indoors spherical panoramas,” in European Conference on Computer Vision (ECCV), 2018. [10] I. Armeni, S. Sax, A. R. Zamir, and S. Savarese, “Joint 2d-3d-semantic data for indoor scene understanding,” arXiv preprint arXiv:1702.01105, 2017. [11] A. Saxena, M. Sun, and A. Y. Ng, “Make3d: Learning 3d scene structure from a single still image,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2008. [12] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Advances in Neural Information Processing Systems (NeurIPS), 2014. [13] K.He,X.Zhang,S.Ren,andJ.Sun,“Deepresiduallearningforimagerecognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [14] J.-H.Lee,M.Heo,K.-R.Kim,andC.-S.Kim,“Single-imagedepthestimationbased on fourier domain analysis,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [15] Y. Cao, Z. Wu, and C. Shen, “Estimating depth from monocular images as classifi- cation using deep fully convolutional residual networks,” in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2018. [16] P. Wang, X. Shen, Z. Lin, S. Cohen, B. Price, and A. L. Yuille, “Towards unified depth and semantic prediction from a single image,” in IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2015. [17] F. Liu, C. Shen, and G. Lin, “Deep convolutional neural fields for depth estimation from a single image,” in IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), 2015. [18] D. Xu, E. Ricci, W. Ouyang, X. Wang, and N. Sebe, “Multi-scale continuous crfs as sequential deep networks for monocular depth estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [19] D.Xu,W.Wang,H.Tang,H.Liu,N.Sebe,andE.Ricci,“Structuredattentionguided convolutional neural fields for monocular depth estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [20] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [21] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth es- timation with left-right consistency,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [22] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [23] Z. Yin and J. Shi, “Geonet: Unsupervised learning of dense depth, optical flow and camera pose,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [24] H. Zhan, R. Garg, C. Saroj Weerasekera, K. Li, H. Agarwal, and I. Reid, “Unsuper- vised learning of monocular depth estimation and visual odometry with deep feature reconstruction,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [25] Z.Yang,P.Wang,W.Xu,L.Zhao,andR.Nevatia,“Unsupervisedlearningofgeom- etry from videos with edge-aware depth-normal consistency,” in AAAI Conference on Artificial Intelligence (AAAI), 2018. [26] C. Wang, J. Miguel Buenaposada, R. Zhu, and S. Lucey, “Learning depth from monocular videos using direct methods,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [27] H.-Y. Lai, Y.-H. Tsai, and W.-C. Chiu, “Bridging stereo matching and optical flow via spatiotemporal correspondence,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [28] N.-H.Wang,B.Solarte,Y.-H.Tsai,W.-C.Chiu,andM.Sun,“360sd-net:360°stereo depth estimation with learnable cost volume,” in IEEE International Conference on Robotics and Automation (ICRA), 2020. [29] C. Zou, A. Colburn, Q. Shan, and D. Hoiem, “Layoutnet: Reconstructing the 3d room layout from a single rgb image,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [30] Y. Zhang, S. Song, P. Tan, and J. Xiao, “Panocontext: A whole-room 3d context model for panoramic scene understanding,” in European Conference on Computer Vision (ECCV), 2014. [31] T. S. Cohen, M. Geiger, J. Köhler, and M. Welling, “Spherical CNNs,” in Interna- tional Conference on Learning Representations (ICLR), 2018. [32] C. Esteves, C. Allen-Blanchette, A. Makadia, and K. Daniilidis, “Learning so(3) equivariant representations with spherical cnns,” in European Conference on Com- puter Vision (ECCV), 2018. [33] Y. Su and K. Grauman, “Kernel transformer networks for compact spherical convo- lution,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [34] Y.-C. Su and K. Grauman, “Learning spherical convolution for fast features from 360◦ imagery,” in Advances in Neural Information Processing Systems (NeurIPS), 2017. [35] M. Eder, P. Moulon, and L. Guan, “Pano popups: Indoor 3d reconstruction with a plane-aware network,” in International Conference on 3D Vision (3DV), 2019. [36] B. Ummenhofer, H. Zhou, J. Uhrig, N. Mayer, E. Ilg, A. Dosovitskiy, and T. Brox, “Demon: Depth and motion network for learning monocular stereo,” in IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR), 2017. [37] J. Cheng, Y.-H. Tsai, S. Wang, and M.-H. Yang, “Segflow: Joint learning for video object segmentation and optical flow,” in IEEE International Conference on Com- puter Vision (ICCV), 2017. [38] Z.Zhang,Z.Cui,C.Xu,Z.Jie,X.Li,andJ.Yang,“Jointtask-recursivelearningfor semantic segmentation and depth estimation,” in European Conference on Computer Vision (ECCV), 2018. [39] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high- performance deep learning library,” in Advances in Neural Information Processing Systems (NeurIPS), 2019. [40] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Interna- tional Conference on Learning Representations (ICLR), 2014. [41] S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser, “Semantic scene completion from a single depth image,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [42] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network train- ing by reducing internal covariate shift,” in International Conference on Machine Learning (ICML), 2015. [43] H. Jiang, Z. Sheng, S. Zhu, Z. Dong, and R. Huang, “Unifuse: Unidirectional fusion for 360° panorama depth estimation,” IEEE Robotics and Automation Letters (RA- L), 2021. [44] C. Sun, M. Sun, and H.-T. Chen, “Hohonet: 360 indoor holistic understanding with latent horizontal features,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [45] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020. [46] S. F. Bhat, I. Alhashim, and P. Wonka, “Adabins: Depth estimation using adaptive bins,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [47] J. Xie, R. Girshick, and A. Farhadi, “Deep3d: Fully automatic 2d-to-3d video con- version with deep convolutional neural networks,” in European Conference on Com- puter Vision (ECCV), 2016. [48] R.Garg,V.K.Bg,G.Carneiro,andI.Reid,“Unsupervisedcnnforsingleviewdepth estimation: Geometry to the rescue,” in European Conference on Computer Vision (ECCV), 2016. [49] S. Vijayanarasimhan, S. Ricco, C. Schmid, R. Sukthankar, and K. Fragki- adaki, “Sfm-net: Learning of structure and motion from video,” arXiv preprint arXiv:1704.07804, 2017. [50] A. Byravan and D. Fox, “Se3-nets: Learning rigid body motion using deep neural networks,” in IEEE International Conference on Robotics and Automation (ICRA), 2017. [51] C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging into self- supervised monocular depth estimation,” in IEEE International Conference on Com- puter Vision (ICCV), 2019. [52] A. Johnston and G. Carneiro, “Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. [53] V.Guizilini,R.Ambrus,S.Pillai,A.Raventos,andA.Gaidon,“3dpackingforself- supervised monocular depth estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. [54] J.-W. Bian, H. Zhan, N. Wang, Z. Li, L. Zhang, C. Shen, M.-M. Cheng, and I. Reid, “Unsupervised scale-consistent depth learning from video,” International Journal of Computer Vision (IJCV), 2021. [55] V. Guizilini, R. Hou, J. Li, R. Ambrus, and A. Gaidon, “Semantically-guided repre- sentation learning for self-supervised monocular depth,” in International Conference on Learning Representations (ICLR), 2020. [56] P.-Y. Chen, A. H. Liu, Y.-C. Liu, and Y.-C. F. Wang, “Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [57] S. Zhu, G. Brazil, and X. Liu, “The edge of depth: Explicit constraints between seg- mentation and depth,” in IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), 2020. [58] M. Klingner, J.-A. Termöhlen, J. Mikolajczyk, and T. Fingscheidt, “Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guid- ance,” in European Conference on Computer Vision (ECCV), 2020. [59] L. Hoyer, D. Dai, Y. Chen, A. Koring, S. Saha, and L. Van Gool, “Three ways to improve semantic segmentation with self-supervised depth estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [60] T.-H. Wang, H.-J. Huang, J.-T. Lin, C.-W. Hu, K.-H. Zeng, and M. Sun, “Omni- directional cnn for visual place recognition and navigation,” in IEEE International Conference on Robotics and Automation (ICRA), 2018. [61] N. Zioulis, A. Karakottas, D. Zarpalas, F. Alvarez, and P. Daras, “Spherical view synthesis for self-supervised 360 depth estimation,” in International Conference on 3D Vision (3DV), 2019. [62] R. Liu, J. Lehman, P. Molino, F. P. Such, E. Frank, A. Sergeev, and J. Yosinski, “An intriguing failing of convolutional neural networks and the coordconv solution,” arXiv, 2018. [63] L. Jin, Y. Xu, J. Zheng, J. Zhang, R. Tang, S. Xu, J. Yu, and S. Gao, “Geometric structure based and regularized depth estimation from 360 indoor imagery,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. [64] W. Zeng, S. Karaoglu, and T. Gevers, “Joint 3d layout and depth prediction from a single indoor panorama image,” in European Conference on Computer Vision (ECCV), 2020. [65] C. Sun, C.-W. Hsiao, M. Sun, and H.-T. Chen, “Horizonnet: Learning room layout with 1d representation and pano stretch data augmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [66] G. Pintore, M. Agus, E. Almansa, J. Schneider, and E. Gobbetti, “Slicenet: deep dense depth estimation from a single indoor panorama using a slice-based represen- tation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [67] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub- pixel convolutional neural network,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [68] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large- Scale Hierarchical Image Database,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009. [69] Y. Zhang, S. Khamis, C. Rhemann, J. Valentin, A. Kowdle, V. Tankovich, M. Schoenberg, S. Izadi, T. Funkhouser, and S. Fanello, “Activestereonet: End- to-end self-supervised learning for active stereo systems,” in European Conference on Computer Vision (ECCV), 2018. [70] J. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba, “Recognizing scene viewpoint using panoramic place representation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [71] C. Zou, J.-W. Su, C.-H. Peng, A. Colburn, Q. Shan, P. Wonka, H.-K. Chu, and D. Hoiem, “Manhattan room layout reconstruction from a single 360 image: A com- parative study of state-of-the-art methods,” International Journal of Computer Vi- sion (IJCV), 2021. [72] F.-E. Wang, Y.-H. Yeh, M. Sun, W.-C. Chiu, and Y.-H. Tsai, “Layoutmp3d: Layout annotation of matterport3d,” arXiv:2003.13516, 2020. [73] G. Pintore, M. Agus, and E. Gobbetti, “Atlantanet: Inferring the 3d indoor layout from a single 360 image beyond the manhattan world assumption,” in European Conference on Computer Vision (ECCV), 2020. |