帳號:guest(3.148.107.229)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):王福恩
作者(外文):Wang, Fu-En
論文名稱(中文):全景室內深度與格局偵測
論文名稱(外文):360 Perception for Indoor Depth and Layout Estimation
指導教授(中文):孫民
指導教授(外文):Sun, Min
口試委員(中文):林嘉文
李祈均
陳奕廷
邱維辰
口試委員(外文):Lin, Chia-Wen
Lee, Chi-Chun
Chen, Yi-Ting
Chiu, Wei-Chen
學位類別:博士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:108061805
出版年(民國):112
畢業學年度:111
語文別:英文
論文頁數:75
中文關鍵詞:全景影像深度學習深度預測格局預測
外文關鍵詞:360deep-learningdepth-estimationlayout-estimation
相關次數:
  • 推薦推薦:0
  • 點閱點閱:519
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
近年來,隨著消費者級別的全景相機越來越普及,深度學習運用在全景相機上的相關演算法在計算機視覺領域開始得到許多重視。此外,因為全景像機能夠同時拍到周圍360度資訊的關係,室內自主系統也開始在使用全景像機進行室內定位與導航,然而,全景影像的場景理解技術至今都沒有成熟的演算法能夠讓大家有效率的去運用,因此,本論文將針對室內自主系統中至關重要的兩個項目 (1) 室內環境深度預測,與 (2) 室內格局預測來進行討論並提出新穎高效率的演算法來提高未來室內自主系統相關應用的可行性。首先,針對深度預測的部分,我們透過結合不同投影資訊的方式來提高既有方法在預測的深度圖上容易產生模糊的問題,同時,我們提出了兩個全新的網路架構BiFuse和BiFuse++來大幅改善全景影像深度預測的精確度;針對格局預測的部分,我們則結合了BiFuse++與LED2-Net來同時運用不同投影的資訊與一維表示法並精確預測出室內格局的資訊。
In recent years, as consumer-level 360 cameras become popular and affordable by most people, algorithms that utilize deep learning and panoramas become important topics in computer vision. Moreover, since 360 cameras are caplable of capturing all surrounding information around the camera, indoor autonomous systems start to adopt these useful sensors for indoor localization and navigation tasks. However, efficient approaches for dealing with these tasks haven't been studied well in computer vision field. Hence, in this paper, we focus on the two important tasks in indoor autonomous systems: 1) Indoor Depth Estimation, and 2) Indoor Layout Estimation. For Indoor Depth Estimation, we utilize the information from different projections of panoramas and propose two novel framework, BiFuse and BiFuse++, to significantly improve the problems existing in previouse works that the predicted depth maps from networks are usually blurred. For Indoor Layout Estimation, we utilize BiFuse++ and LED$^2$-Net to simultaneously use the information from different projections and 1D representation to precisely estimate layouts from panoramas.
Acknowledgements-3
摘要-5
Abstract-7
Declaration-9
Contents-11
List of Figures 13
List of Tables 19
Ch1 (BiFuse: Monocular 360 Depth Estimation via Bi-projection Fusion)-1
1.1 Introduction-1
1.2 Related Work-4
1.3 Our Approach-6
1.3.1 Preliminary-6
1.3.2 Proposed Spherical Padding-8
1.3.3 Proposed BiFuse Network-11
1.3.4 Implementation Details-14
1.4 Experimental Results-15
1.4.1 Evaluation Metrics and Datasets-15
1.4.2 Overall Performance-17
1.4.3 More Results and Ablation Study-19
1.5 Conclusions-22
Ch2 (BiFuse++: Self-supervised and Efficient Bi-projection Fusion for 360 Depth Estimation)-27
2.1 Introduction-27
2.2 Related Works-32
2.3 Approach-36
2.3.1 Spherical Projection-37
2.3.2 Our BiFuse++ Framework-39
2.4 Experimental Results-44
2.4.1 Evaluation Metrics and Datasets-45
2.4.2 Implementation Details-47
2.4.3 Results of Supervised Scenario-48
2.4.4 Computational Comparison-51
2.4.5 Results of Self-Supervised Scenario-53
2.5 Conclusion-57
Ch3 (BiFuse++ and LED2-Net)-59
3.1 Introduction-59
3.2 Experiments-61
3.2.1 Experimental Results-62
Ch4 (Conclusions)-63
References-65
[1] F.-E. Wang, Y.-H. Yeh, M. Sun, W.-C. Chiu, and Y.-H. Tsai, “Bifuse: Monocular 360 depth estimation via bi-projection fusion,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[2] F.-E. Wang, Y.-H. Yeh, Y.-H. Tsai, W.-C. Chiu, and M. Sun, “Bifuse++: Self- supervised and efficient bi-projection fusion for 360◦ depth estimation,” IEEE Trans- actions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.
[3] F.-E. Wang, Y.-H. Yeh, M. Sun, W.-C. Chiu, and Y.-H. Tsai, “Led2-net: Monocular 360deg layout estimation via differentiable depth rendering,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[4] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,” in International Conference on 3D Vision (3DV), 2016.
[5] F.-E.Wang,H.-N.Hu,H.-T.Cheng,J.-T.Lin,S.-T.Yang,M.-L.Shih,H.-K.Chu,and M. Sun, “Self-supervised learning of depth and camera motion from 360◦ videos,” in Asian Conference on Computer Vision (ACCV), 2018.
[6] H.-T. Cheng, C.-H. Chao, J.-D. Dong, H.-K. Wen, T.-L. Liu, and M. Sun, “Cube padding for weakly-supervised saliency prediction in 360◦ videos,” in IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2018.
[7] S.-T. Yang, F.-E. Wang, C.-H. Peng, P. Wonka, M. Sun, and H.-K. Chu, “Dula-net: A dual-projection network for estimating room layouts from a single rgb panorama,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[8] A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matterport3D: Learning from RGB-D data in indoor envi- ronments,” International Conference on 3D Vision (3DV), 2017.
[9] N. Zioulis, A. Karakottas, D. Zarpalas, and P. Daras, “Omnidepth: Dense depth estimation for indoors spherical panoramas,” in European Conference on Computer Vision (ECCV), 2018.
[10] I. Armeni, S. Sax, A. R. Zamir, and S. Savarese, “Joint 2d-3d-semantic data for indoor scene understanding,” arXiv preprint arXiv:1702.01105, 2017.
[11] A. Saxena, M. Sun, and A. Y. Ng, “Make3d: Learning 3d scene structure from a single still image,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2008.
[12] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Advances in Neural Information Processing Systems (NeurIPS), 2014.
[13] K.He,X.Zhang,S.Ren,andJ.Sun,“Deepresiduallearningforimagerecognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[14] J.-H.Lee,M.Heo,K.-R.Kim,andC.-S.Kim,“Single-imagedepthestimationbased on fourier domain analysis,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[15] Y. Cao, Z. Wu, and C. Shen, “Estimating depth from monocular images as classifi- cation using deep fully convolutional residual networks,” in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2018.
[16] P. Wang, X. Shen, Z. Lin, S. Cohen, B. Price, and A. L. Yuille, “Towards unified depth and semantic prediction from a single image,” in IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2015.
[17] F. Liu, C. Shen, and G. Lin, “Deep convolutional neural fields for depth estimation from a single image,” in IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), 2015.
[18] D. Xu, E. Ricci, W. Ouyang, X. Wang, and N. Sebe, “Multi-scale continuous crfs as sequential deep networks for monocular depth estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[19] D.Xu,W.Wang,H.Tang,H.Liu,N.Sebe,andE.Ricci,“Structuredattentionguided convolutional neural fields for monocular depth estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[20] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[21] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth es- timation with left-right consistency,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[22] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[23] Z. Yin and J. Shi, “Geonet: Unsupervised learning of dense depth, optical flow and camera pose,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[24] H. Zhan, R. Garg, C. Saroj Weerasekera, K. Li, H. Agarwal, and I. Reid, “Unsuper- vised learning of monocular depth estimation and visual odometry with deep feature reconstruction,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[25] Z.Yang,P.Wang,W.Xu,L.Zhao,andR.Nevatia,“Unsupervisedlearningofgeom- etry from videos with edge-aware depth-normal consistency,” in AAAI Conference on Artificial Intelligence (AAAI), 2018.
[26] C. Wang, J. Miguel Buenaposada, R. Zhu, and S. Lucey, “Learning depth from monocular videos using direct methods,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[27] H.-Y. Lai, Y.-H. Tsai, and W.-C. Chiu, “Bridging stereo matching and optical flow via spatiotemporal correspondence,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[28] N.-H.Wang,B.Solarte,Y.-H.Tsai,W.-C.Chiu,andM.Sun,“360sd-net:360°stereo depth estimation with learnable cost volume,” in IEEE International Conference on Robotics and Automation (ICRA), 2020.
[29] C. Zou, A. Colburn, Q. Shan, and D. Hoiem, “Layoutnet: Reconstructing the 3d room layout from a single rgb image,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[30] Y. Zhang, S. Song, P. Tan, and J. Xiao, “Panocontext: A whole-room 3d context model for panoramic scene understanding,” in European Conference on Computer Vision (ECCV), 2014.
[31] T. S. Cohen, M. Geiger, J. Köhler, and M. Welling, “Spherical CNNs,” in Interna- tional Conference on Learning Representations (ICLR), 2018.
[32] C. Esteves, C. Allen-Blanchette, A. Makadia, and K. Daniilidis, “Learning so(3) equivariant representations with spherical cnns,” in European Conference on Com- puter Vision (ECCV), 2018.
[33] Y. Su and K. Grauman, “Kernel transformer networks for compact spherical convo- lution,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[34] Y.-C. Su and K. Grauman, “Learning spherical convolution for fast features from 360◦ imagery,” in Advances in Neural Information Processing Systems (NeurIPS), 2017.
[35] M. Eder, P. Moulon, and L. Guan, “Pano popups: Indoor 3d reconstruction with a plane-aware network,” in International Conference on 3D Vision (3DV), 2019.
[36] B. Ummenhofer, H. Zhou, J. Uhrig, N. Mayer, E. Ilg, A. Dosovitskiy, and T. Brox, “Demon: Depth and motion network for learning monocular stereo,” in IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR), 2017.
[37] J. Cheng, Y.-H. Tsai, S. Wang, and M.-H. Yang, “Segflow: Joint learning for video object segmentation and optical flow,” in IEEE International Conference on Com- puter Vision (ICCV), 2017.
[38] Z.Zhang,Z.Cui,C.Xu,Z.Jie,X.Li,andJ.Yang,“Jointtask-recursivelearningfor semantic segmentation and depth estimation,” in European Conference on Computer Vision (ECCV), 2018.
[39] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high- performance deep learning library,” in Advances in Neural Information Processing Systems (NeurIPS), 2019.
[40] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Interna- tional Conference on Learning Representations (ICLR), 2014.
[41] S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser, “Semantic scene completion from a single depth image,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[42] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network train- ing by reducing internal covariate shift,” in International Conference on Machine Learning (ICML), 2015.
[43] H. Jiang, Z. Sheng, S. Zhu, Z. Dong, and R. Huang, “Unifuse: Unidirectional fusion for 360° panorama depth estimation,” IEEE Robotics and Automation Letters (RA- L), 2021.
[44] C. Sun, M. Sun, and H.-T. Chen, “Hohonet: 360 indoor holistic understanding with latent horizontal features,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[45] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
[46] S. F. Bhat, I. Alhashim, and P. Wonka, “Adabins: Depth estimation using adaptive bins,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[47] J. Xie, R. Girshick, and A. Farhadi, “Deep3d: Fully automatic 2d-to-3d video con- version with deep convolutional neural networks,” in European Conference on Com- puter Vision (ECCV), 2016.
[48] R.Garg,V.K.Bg,G.Carneiro,andI.Reid,“Unsupervisedcnnforsingleviewdepth estimation: Geometry to the rescue,” in European Conference on Computer Vision (ECCV), 2016.
[49] S. Vijayanarasimhan, S. Ricco, C. Schmid, R. Sukthankar, and K. Fragki- adaki, “Sfm-net: Learning of structure and motion from video,” arXiv preprint arXiv:1704.07804, 2017.
[50] A. Byravan and D. Fox, “Se3-nets: Learning rigid body motion using deep neural networks,” in IEEE International Conference on Robotics and Automation (ICRA), 2017.
[51] C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging into self- supervised monocular depth estimation,” in IEEE International Conference on Com- puter Vision (ICCV), 2019.
[52] A. Johnston and G. Carneiro, “Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[53] V.Guizilini,R.Ambrus,S.Pillai,A.Raventos,andA.Gaidon,“3dpackingforself- supervised monocular depth estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[54] J.-W. Bian, H. Zhan, N. Wang, Z. Li, L. Zhang, C. Shen, M.-M. Cheng, and I. Reid, “Unsupervised scale-consistent depth learning from video,” International Journal of Computer Vision (IJCV), 2021.
[55] V. Guizilini, R. Hou, J. Li, R. Ambrus, and A. Gaidon, “Semantically-guided repre- sentation learning for self-supervised monocular depth,” in International Conference on Learning Representations (ICLR), 2020.
[56] P.-Y. Chen, A. H. Liu, Y.-C. Liu, and Y.-C. F. Wang, “Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[57] S. Zhu, G. Brazil, and X. Liu, “The edge of depth: Explicit constraints between seg- mentation and depth,” in IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), 2020.
[58] M. Klingner, J.-A. Termöhlen, J. Mikolajczyk, and T. Fingscheidt, “Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guid- ance,” in European Conference on Computer Vision (ECCV), 2020.
[59] L. Hoyer, D. Dai, Y. Chen, A. Koring, S. Saha, and L. Van Gool, “Three ways to improve semantic segmentation with self-supervised depth estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[60] T.-H. Wang, H.-J. Huang, J.-T. Lin, C.-W. Hu, K.-H. Zeng, and M. Sun, “Omni- directional cnn for visual place recognition and navigation,” in IEEE International Conference on Robotics and Automation (ICRA), 2018.
[61] N. Zioulis, A. Karakottas, D. Zarpalas, F. Alvarez, and P. Daras, “Spherical view synthesis for self-supervised 360 depth estimation,” in International Conference on 3D Vision (3DV), 2019.
[62] R. Liu, J. Lehman, P. Molino, F. P. Such, E. Frank, A. Sergeev, and J. Yosinski, “An intriguing failing of convolutional neural networks and the coordconv solution,” arXiv, 2018.
[63] L. Jin, Y. Xu, J. Zheng, J. Zhang, R. Tang, S. Xu, J. Yu, and S. Gao, “Geometric structure based and regularized depth estimation from 360 indoor imagery,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[64] W. Zeng, S. Karaoglu, and T. Gevers, “Joint 3d layout and depth prediction from a single indoor panorama image,” in European Conference on Computer Vision (ECCV), 2020.
[65] C. Sun, C.-W. Hsiao, M. Sun, and H.-T. Chen, “Horizonnet: Learning room layout with 1d representation and pano stretch data augmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[66] G. Pintore, M. Agus, E. Almansa, J. Schneider, and E. Gobbetti, “Slicenet: deep dense depth estimation from a single indoor panorama using a slice-based represen- tation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[67] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub- pixel convolutional neural network,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[68] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large- Scale Hierarchical Image Database,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
[69] Y. Zhang, S. Khamis, C. Rhemann, J. Valentin, A. Kowdle, V. Tankovich, M. Schoenberg, S. Izadi, T. Funkhouser, and S. Fanello, “Activestereonet: End- to-end self-supervised learning for active stereo systems,” in European Conference on Computer Vision (ECCV), 2018.
[70] J. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba, “Recognizing scene viewpoint using panoramic place representation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[71] C. Zou, J.-W. Su, C.-H. Peng, A. Colburn, Q. Shan, P. Wonka, H.-K. Chu, and D. Hoiem, “Manhattan room layout reconstruction from a single 360 image: A com- parative study of state-of-the-art methods,” International Journal of Computer Vi- sion (IJCV), 2021.
[72] F.-E. Wang, Y.-H. Yeh, M. Sun, W.-C. Chiu, and Y.-H. Tsai, “Layoutmp3d: Layout annotation of matterport3d,” arXiv:2003.13516, 2020.
[73] G. Pintore, M. Agus, and E. Gobbetti, “Atlantanet: Inferring the 3d indoor layout from a single 360 image beyond the manhattan world assumption,” in European Conference on Computer Vision (ECCV), 2020.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *