|
[1] C. Sun, C. Hsiao, M. Sun, and H. Chen, “Horizonnet: Learning room layout with 1d representation and pano stretch data augmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 1047–1056, Computer Vision Foundation / IEEE, 2019. [2] C. Sun, C. Hsiao, N. Wang, M. Sun, and H. Chen, “Indoor panorama planar 3d reconstruction via divide and conquer,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp. 11338–11347, Computer Vision Foundation / IEEE, 2021. [3] C. Sun, M. Sun, and H. Chen, “Hohonet: 360 indoor holistic understanding with latent horizontal features,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp. 2573–2582, Computer Vision Foundation / IEEE, 2021. [4] J. M. Coughlan and A. L. Yuille, “Manhattan world: Compass direction from a single image by bayesian inference,” in Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, vol. 2, pp. 941–947, IEEE, 1999. [5] E. Delage, H. Lee, and A. Y. Ng, “A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image,” in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, pp. 2418–2428, IEEE, 2006. [6] D.C.Lee,M.Hebert,andT.Kanade,“Geometricreasoningforsingleimagestructure recovery,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 2136–2143, IEEE, 2009. [7] V. Hedau, D. Hoiem, and D. Forsyth, “Recovering the spatial layout of cluttered rooms,” in Computer vision, 2009 IEEE 12th international conference on, pp. 1849–1856, IEEE, 2009. [8] D.Hoiem,A.A.Efros,andM.Hebert,“Recoveringsurfacelayoutfromanimage,” International Journal of Computer Vision, vol. 75, no. 1, pp. 151–172, 2007. [9] A. Gupta, M. Hebert, T. Kanade, and D. M. Blei, “Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces,” in Advances in neural information processing systems, pp. 1288–1296, 2010. [10] A. G. Schwing and R. Urtasun, “Efficient exact inference for 3d indoor scene understanding,” in European Conference on Computer Vision, pp. 299–313, Springer, 2012. [11] R. Urtasun, M. Pollefeys, T. Hazan, and A. Schwing, “Efficient structured prediction for 3d indoor scene understanding,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2815–2822, IEEE, 2012. [12] S. Ramalingam, J.K. Pillai, A. Jain, and Y. Taguchi, “Manhattan junction catalogue for spatial reasoning of indoor scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3065–3072, 2013. [13] L. DelPero, J. Bowdish, B. Kermgard, E. Hartley, and K. Barnard, “Understanding bayesian rooms using composite 3d object models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 153–160, 2013. [14] Y. Zhao and S.-C. Zhu, “Scene parsing by integrating function, geometry and appearance models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3119–3126, 2013. [15] Y. Zhang, S. Song, P. Tan, and J. Xiao, “Panocontext: A whole-room 3d context model for panoramic scene understanding,” in European Conference on Computer Vision, pp. 668–686, Springer, 2014. [16] J. Xu, B. Stenger, T. Kerola, and T. Tung, “Pano2cad: Room layout from a single panorama image,” in Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on, pp. 354–362, IEEE, 2017. [17] H. Yangand H. Zhang, “Efficient 3d room shape recovery from a single panorama,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5422–5430, 2016. [18] Y. Yang, S. Jin, R. Liu, S.B. Kang, and J. Yu, “Automatic 3d indoor scene modeling from single panorama,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3926–3934, 2018. [19] G. Pintore, V. Garro, F. Ganovelli, E. Gobbetti, and M. Agus, “Omnidirectional image capture on mobile devices for fast automatic generation of 2.5 d indoor maps,” in Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on, pp. 1–9, IEEE, 2016. [20] R. Cabral and Y. Furukawa, “Piecewise planar and compact floorplan reconstruction from images,” in Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 628–635, IEEE, 2014. [21] A. Mallya and S.Lazebnik, “Learning in formative edge maps for indoor scene layout prediction,” in Proceedings of the IEEE international conference on computer vision, pp. 936–944, 2015. [22] Y. Ren, S. Li, C. Chen, and C.C.J. Kuo, “A coarse-to-fine indoor layout estimation method,” in Asian Conference on Computer Vision, pp. 36–51, Springer, 2016. [23] H. Zhao, M. Lu, A. Yao, Y. Guo, Y. Chen, and L. Zhang, “Physics inspired optimization on semantic transfer features: An alternative method for room layout estimation,” arXiv preprint arXiv:1707.00383, 2017. [24] S. Dasgupta, K. Fang, K. Chen, and S. Savarese, “Delay: Robust spatial layout estimation for cluttered indoor scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 616–624, 2016. [25] H. Izadinia, Q. Shan, and S. M. Seitz, “Im2cad,” in CVPR, 2017. [26] C.-Y.Lee,V.Badrinarayanan,T.Malisiewicz,andA.Rabinovich,“Roomnet:Endto-end room layout estimation,” in Computer Vision (ICCV), 2017 IEEE International Conference on, pp. 4875–4884, IEEE, 2017. [27] C. Zou, A. Colburn, Q. Shan, and D. Hoiem, “Layoutnet: Reconstructing the 3d room layout from a single rgb image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2051–2059, 2018. [28] I. Armeni, A. Sax, A. R. Zamir, and S. Savarese, “Joint 2D-3D-Semantic Data for Indoor Scene Understanding,” ArXiv e-prints, Feb. 2017. [29] C. Fernandez-Labrador, A. Perez-Yus, G. Lopez-Nicolas, and J. J. Guerrero, “Layouts from panoramic images with geometry and deep learning,” arXiv preprint arXiv:1806.08294, 2018. [30] S.-T. Yang, F.-E. Wang, C.-H. Peng, P. Wonka, M. Sun, and H.-K. Chu, “Dulanet: A dual-projection network for estimating room layouts from a single rgb panorama,” arXiv preprint arXiv:1811.11977, 2018. [31] C. Fernandez-Labrador, J. M. Fácil, A. Perez-Yus, C. Demonceaux, J. Civera, and J. J. Guerrero, “Corners for layout: End-to-end layout recovery from 360 images,” arXiv:1903.08094, 2019. [32] C. Fernandez-Labrador, J. M. Facil, A. Perez-Yus, C. Demonceaux, and J. J. Guerrero, “Panoroom: From the sphere to the 3d layout,” arXiv preprint arXiv:1808.09879, 2018. [33] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778, IEEE Computer Society, 2016. [34] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [35] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997. [36] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014. [37] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241, Springer, 2015. [38] C. Liu, K. Kim, J. Gu, Y. Furukawa, and J. Kautz, “PlaneRCNN: 3d plane detection and reconstruction from a single image,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 4450–4459, 2019. [39] C. Liu, J. Yang, D. Ceylan, E. Yumer, and Y. Furukawa, “PlaneNet: piece-wise planar reconstruction from a single RGB image,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 2579–2588, 2018. [40] A. Newell, Z. Huang, and J. Deng, “Associative embedding: End-to-end learning for joint detection and grouping,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 2277–2287, 2017. [41] A. Dai, A. X. Chang, M. Savva, M. Halber, T. A. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2432–2443, 2017. [42] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” in Computer Vision ECCV 2012 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V, pp. 746–760, 2012. [43] A.X. Chang, A. Dai, T.A. Funkhouser, M. Halber, M. Nießner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matterport3D: learning from RGB-D data in indoor environments,” in 2017 International Conference on 3D Vision, 3DV 2017, Qingdao, China, October 10-12, 2017, pp. 667–676, 2017. [44] I. Armeni, S. Sax, A. R. Zamir, and S. Savarese, “Joint 2d-3d-semantic data for indoor scene understanding,” CoRR, vol. abs/1702.01105, 2017. [45] N. Wang, B. Solarte, Y. Tsai, W. Chiu, and M. Sun, “360sd-net: 360° stereo depth estimation with learnable cost volume,” in 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31 August 31, 2020, pp. 582–588, IEEE, 2020. [46] J.M. Coughlan and A.L.Yuille, “The manhattan world assumption: Regularitiesin scene statistics which enable bayesian inference,” in Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, Denver, CO, USA, pp. 845–851, MIT Press, 2000. [47] G. Schindler and F. Dellaert, “Atlanta world: An expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments,” in 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), with CD-ROM, 27 June 2 July 2004, Washington, DC, USA, pp. 203–209, IEEE Computer Society, 2004. [48] Z. Yu, J. Zheng, D. Lian, Z. Zhou, and S. Gao, “Single-image piece-wise planar 3d reconstruction via associative embedding,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 1029–1037, 2019. [49] F.YangandZ.Zhou,“Recovering3dplanesfromasingleimageviaconvolutional neural networks,” in Computer Vision ECCV 2018 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part X, vol. 11214 of Lecture Notes in Computer Science, pp. 87–103, Springer, 2018. [50] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick, “Mask R-CNN,” in IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2980–2988, 2017. [51] B.D. Brabandere, D. Neven, and L.V. Gool, “Semantic instance segmentation with a discriminative loss function,” CoRR, vol. abs/1708.02551, 2017. [52] A. Fathi, Z. Wojna, V. Rathod, P. Wang, H.O. Song, S. Guadarrama, and K.P. Murphy, “Semantic instance segmentation via deep metric learning,” CoRR, vol. abs/ 1703.10277, 2017. [53] S.KongandC.C.Fowlkes,“Recurrentpixelembeddingforinstancegrouping,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 9018–9028, 2018. [54] Z. Jiang, B. Liu, S. Schulter, Z. Wang, and M. Chandraker, “Peek-a-boo: Occlusion reasoning in indoor scenes with plane representations,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 110–118, IEEE, 2020. [55] Y. Qian and Y. Furukawa, “Learning pairwise inter-plane relations for piecewise planar reconstruction,” in Computer Vision ECCV 2020 European Conference, 2020. [56] H.Yang and H. Zhang, “Efficient 3d room shape recovery from a single panorama,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 5422–5430, 2016. [57] S. Song, A. Zeng, A. X. Chang, M. Savva, S. Savarese, and T. A. Funkhouser, “Im2pano3d: Extrapolating 360° structure and semantics beyond the field of view,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 3847–3856, 2018. [58] J. Xu, B. Stenger, T. Kerola, and T. Tung, “Pano2cad: Room layout from a single panorama image,” in 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017, Santa Rosa, CA, USA, March 24-31, 2017, pp. 354–362, 2017. [59] M. Eder, P. Moulon, and L. Guan, “Pano popups: Indoor 3d reconstruction with a plane-aware network,” in 2019 International Conference on 3D Vision, 3DV 2019, Québec City, QC, Canada, September 16-19, 2019, pp. 76–84, 2019. [60] R. Liu, J. Lehman, P. Molino, F.P. Such, E. Frank, A. Sergeev, and J. Yosinski, “An intriguing failing of convolutional neural networks and the coordconv solution,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pp. 9628–9639, 2018. [61] J. Zheng, J. Zhang, J. Li, R. Tang, S. Gao, and Z. Zhou, “Structured3D: A large photo-realistic dataset for structured 3d modeling,” CoRR, vol. abs/1908.00222, 2019. [62] C. Fernandez-Labrador, A. Pérez-Yus, G. López-Nicolás, and J. J. Guerrero, “Layouts from panoramic images with geometry and deep learning,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3153–3160, 2018. [63] Y. Yang, S. Jin, R. Liu, S.B. Kang, and J. Yu, “Automatic 3d indoor scene modeling from single panorama,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 3926– 3934, 2018. [64] Y. Zhang, S. Song, P. Tan, and J. Xiao, “PanoContext: A whole-room 3d context model for panoramic scene understanding,” in Computer Vision ECCV 2014 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI, pp. 668–686, 2014. [65] C. Zou, A. Colburn, Q. Shan, and D. Hoiem, “LayoutNet: reconstructing the 3d room layout from a single RGB image,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 2051–2059, 2018. [66] L. Jin, Y. Xu, J. Zheng, J. Zhang, R. Tang, S. Xu, J. Yu, and S. Gao, “Geometric structure based and regularized depth estimation from 360 indoor imagery,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 886–895, IEEE, 2020. [67] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pp. 248–255, 2009. [68] T. Wang, H. Huang, J. Lin, C. Hu, K. Zeng, and M. Sun, “Omnidirectional CNN for visual place recognition and navigation,” in 2018 IEEE International Conference on Robotics and Automation, ICRA 2018, Brisbane, Australia, May 21-25, 2018, pp. 2341–2348, 2018. [69] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. [70] P. Arbelaez, M. Maire, C. C. Fowlkes, and J. Malik, “Contour detection and hierarchical image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 898–916, 2011. [71] M. Eder, M. Shvets, J. Lim, and J. Frahm, “Tangent images for mitigating spherical distortion,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 12423–12431, IEEE, 2020. [72] Y. K. Lee, J. Jeong, J. S. Yun, W. Cho, and K. Yoon, “Spherephd: Applying cnns on a spherical polyhedron representation of 360deg images,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 9181–9189, 2019. [73] C. Zhang, S. Liwicki, W. Smith, and R. Cipolla, “Orientation-aware semantic segmentation on icosahedron spheres,” in 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 November 2, 2019, pp. 3532–3540, IEEE, 2019. [74] F. Wang, Y. Yeh, M. Sun, W. Chiu, and Y. Tsai, “Bifuse: Monocular 360 depth estimation via bi-projection fusion,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 459–468, IEEE, 2020. [75] W. Zeng, S. Karaoglu, and T. Gevers, “Joint 3d layout and depth prediction from a single indoor panorama image,” in Computer Vision ECCV 2020 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XVI, vol. 12361 of Lecture Notes in Computer Science, pp. 666–682, Springer, 2020. [76] S. Yang, F. Wang, C. Peng, P. Wonka, M. Sun, and H. Chu, “Dula-net: A dualprojection network for estimating room layouts from a single RGB panorama,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 3363–3372, 2019. [77] C. Zou, J. Su, C. Peng, A. Colburn, Q. Shan, P. Wonka, H. Chu, and D. Hoiem, “3d manhattan room layout reconstruction from a single 360 image,” CoRR, vol. abs/ 1910.04099, 2019. [78] D. S. Chaplot, R. Salakhutdinov, A. Gupta, and S. Gupta, “Neural topological SLAM for visual navigation,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 12872–12881, IEEE, 2020. [79] J. Zheng, J. Zhang, J. Li, R. Tang, S. Gao, and Z. Zhou, “Structured3d: A large photo-realistic dataset for structured 3d modeling,” in Proceedings of The European Conference on Computer Vision (ECCV), 2020. [80] F. Wang, Y. Yeh, M. Sun, W. Chiu, and Y. Tsai, “Layoutmp3d: Layout annotation of matterport3d,” CoRR, vol. abs/2003.13516, 2020. [81] T. S. Cohen, M. Geiger, J. Köhler, and M. Welling, “Spherical CNNs,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 May 3, 2018, Conference Track Proceedings, 2018. [82] B. Coors, A. P. Condurache, and A. Geiger, “Spherenet: Learning spherical representations for detection and classification in omnidirectional images,” in Computer Vision ECCV 2018 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IX, pp. 525–541, 2018. [83] Y.SuandK.Grauman,“Learningsphericalconvolutionforfastfeaturesfrom360° imagery,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 529–539, 2017. [84] Y.SuandK.Grauman,“Kerneltransformernetworksforcompactsphericalconvolution,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 9442–9451, 2019. [85] K. Tateno, N. Navab, and F. Tombari, “Distortion-aware convolutional filters for dense prediction in panoramic images,” in Computer Vision ECCV 2018 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XVI, vol. 11220 of Lecture Notes in Computer Science, pp. 732–750, Springer, 2018. [86] H. Cheng, C. Chao, J. Dong, H. Wen, T. Liu, and M. Sun, “Cube padding for weakly-supervised saliency prediction in 360° videos,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 1420–1429, 2018. [87] N. Zioulis, A. Karakottas, D. Zarpalas, and P. Daras, “OmniDepth: dense depth estimation for indoors spherical panoramas,” in Computer Vision ECCV 2018 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part VI, pp. 453–471, 2018. [88] T. Cohen, M. Weiler, B. Kicanaoglu, and M. Welling, “Gaug eequivariant convolutional networks and the icosahedral CNN,” in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, vol. 97 of Proceedings of Machine Learning Research, pp. 1321–1330, PMLR, 2019. [89] C. M. Jiang, J. Huang, K. Kashinath, Prabhat, P. Marcus, and M. Nießner, “Spherical cnns on unstructured grids,” in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019. [90] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997. [91] A.Vaswani,N.Shazeer,N.Parmar,J.Uszkoreit,L.Jones,A.N.Gomez,L.Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 5998–6008, 2017. [92] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,” in Fourth International Conference on 3D Vision, 3DV 2016, Stanford, CA, USA, October 25-28, 2016, pp. 239–248, 2016. [93] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and ComputerAssisted Intervention MICCAI 2015 18th International Conference Munich, Germany, October 5 9, 2015, Proceedings, Part III, vol. 9351 of Lecture Notes in Computer Science, pp. 234–241, Springer, 2015. [94] G. Pintore, M. Agus, and E. Gobbetti, “Atlantanet: Inferring the 3D indoor layout from a single 360 image beyond the manhattan world assumption,” in Proceedings of The European Conference on Computer Vision (ECCV), 2020. [95] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, OpenReview.net, 2021. [96] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in 2021 IEEE/ CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pp. 9992–10002, IEEE, 2021. [97] Z. Jiang, Z. Xiang, J. Xu, and M. Zhao, “Lgt-net: Indoor panoramic room layout estimation with geometry-aware transformer network,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp. 1644–1653, IEEE, 2022. [98] W. Shen, Y. Dong, Z. Chen, Z. Zhao, Y. Gao, and Z. Liu, “Panovit: Vision transformer for room layout estimation from a single panoramic image,” CoRR, vol. abs/ 2212.12156, 2022. [99] Z. Shen, C. Lin, K. Liao, L. Nie, Z. Zheng, and Y. Zhao, “Panoformer: Panorama transformer for indoor 360$^{\circ }$ depth estimation,” in Computer Vision ECCV 2022 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part I, vol. 13661 of Lecture Notes in Computer Science, pp. 195– 211, Springer, 2022. [100] J. Zhang, K. Yang, H. Shi, S. Reiß, K. Peng, C. Ma, H. Fu, K. Wang, and R. Stiefelhagen, “Behind every domain there is a shift: Adapting distortion-aware vision transformers for panoramic semantic segmentation,” CoRR, vol. abs/2207.11860, 2022. [101] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba, “Scene parsing through ADE20K dataset,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 5122– 5130, IEEE Computer Society, 2017. [102] I. Yun, H. Lee, and C. Rhee, “Improving 360 monocular depth estimation via nonlocal dense prediction transformer and joint supervised and self-supervised learning,” in Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 March 1, 2022, pp. 3224–3233, AAAI Press, 2022. [103] M. S. Junayed, A. Sadeghzadeh, M. B. Islam, L. Wong, and T. Aydin, “Himode: A hybrid monocular omnidirectional depth estimation model,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2022, New Orleans, LA, USA, June 19-20, 2022, pp. 5208–5217, IEEE, 2022. [104] M. Li, S. Wang, W. Yuan, W. Shen, Z. Sheng, and Z. Dong, “$S2net: Accurate panorama depth estimation on spherical surface,” IEEE Robotics Autom. Lett., vol. 8, no. 2, pp. 1053–1060, 2023. [105] M. Rey-Area, M. Yuan, and C. Richardt, “360monodepth: High-resolution 360° monocular depth estimation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp. 3752–3762, IEEE, 2022. |