帳號:guest(18.116.50.234)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):王尊玄
作者(外文):Wang, Tsun-Hsuan
論文名稱(中文):用光達資料增進單目和雙目深度預測
論文名稱(外文):Improve monocular and stereo depth estimation with LiDAR data
指導教授(中文):孫民
指導教授(外文):Sun, Min
口試委員(中文):邱維辰
王傑智
口試委員(外文):Chiu, Wei-Chen
Wang, Chieh-Chi
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:106061521
出版年(民國):109
畢業學年度:108
語文別:英文
論文頁數:43
中文關鍵詞:深度預測單目雙目光達
外文關鍵詞:depth estimationmonocularstereoLiDAR
相關次數:
  • 推薦推薦:0
  • 點閱點閱:914
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
隨著基於RGB影像的深度預測的發展,光達感測器逐漸被使用於提供額外稀疏但準確的幾何資訊。在本篇論文中,我們著重於運用光達資料於單目和雙目的深度預測。首先,我們提出一個全新的可插拔模組以利用任意類型的稀疏深度圖於增進深度預測。此方法不需要額外訓練而且可以應用於各種任務像是同時利用RGB影像和稀疏光達資料以估計稠密深度圖。我們的方法應用於各類最優表現的模型於室內(NYU-v2)和室外(KITTI)的數據集上都帶來性能的提升,我們也模擬不同類型的光達以證明此方法在實際應用上的通用性。另外,光達感測器和雙目相機之融合以加強深度預測源自於主動式和被動式深度感測技術的互補性。相比於直接在光達和雙目視覺分別產生的深度預測上進行融合,我們在雙目視覺網絡上運用了兩個技術以涵蓋光達資料:輸入融合與條件式成本容積歸一化。被提出的架構是通用的雙目匹配網絡緊密地契合。我們實驗在KITTI的雙目數據集和深度補全數據集以證明我們的方法是有效和強健的,並且在與其他融合方法的比較中展現更好的效能。另外,我們展示了條件式成本容積歸一化的階層式延伸方法只為雙目匹配網絡在運算時間和模型體積上帶來些微的成本。
With the advance of depth estimation based on RGB imagery, LiDAR sensors become more popular as an additional source to provide sparse but accurate geometric information. In this paper, we focus on leveraging Li-DAR measurement in monocular and stereo depth estimation. Firstly, we propose a novel plug-and-play (PnP) module for improving depth prediction with taking arbitrary patterns of sparse depths as input. Our approach achieves consistent improvements on various state-of-the-art methods on indoor (i.e., NYU-v2) and outdoor (i.e., KITTI) datasets. Various types of LiDARs are also synthesized in our experiments to verify the general applicability of our PnP module in practice. Furthermore, the complementary characteristics of active and passive depth sensing techniques motivate the fusion of the LiDAR sensor and stereo camera for improved depth perception. Instead of directly fusing estimated depths across LiDAR and stereo modalities, we take advantage of the stereo matching network with two enhanced techniques: Input Fusion and Conditional Cost Volume Normalization (CCVNorm) on the LiDAR information. The proposed frameworkisgenericandcloselyintegratedwithstereomatchingneuralnetworks. Weexperimentally verify the efficacy and robustness of our method on the KITTI Stereo and Depth Completion datasets, obtaining favorable performance against various fusion strategies. Moreover, we demonstrate that a hierarchical extension of CCVNorm brings only slight overhead to the stereo matching network in terms of computation time and model size.
Declaration v
摘要 vii Abstract ix
1 Introduction 1
2 Related Work 5
2.1 Monocular Depth Estimation....................... 5
2.2 Stereo Matching.............................. 6
2.3 RGB Imagery and LiDAR Fusion .................... 6
2.4 Conditional Batch Normalization..................... 8
3 Method 9
3.1 Preliminary for Depth Estimation..................... 10
3.2 Sparse Data Propagation ......................... 11
3.3 Theoretical Discussion .......................... 12
3.4 Property and Usage............................ 13
3.5 Preliminary for Stereo Matching Network . . . . . . . . . . . . . . . . 14
3.6 Input Fusion ............................... 15
3.7 Conditional Cost Volume Normalization (CCVNorm) . . . . . . . . . . 15
3.8 Hierarchical Extension .......................... 17
4 Experiments 19
4.1 Dataset and Metrics............................ 19
4.2 Implementation Details.......................... 20
4.3 OverallPerformance ........................... 22
4.4 Ablation Study on Plug-and-PlayDepth ................. 26
4.5 Ablation Study on Stereo and LiDAR Fusion . . . . . . . . . . . . . . 28
4.6 Robustness to LiDAR Density ...................... 30
4.7 Qualitative Analysis............................ 32
5 Conclusion 37
References 39
[1] T. Wang, F. Wang, J. Lin, Y. Tsai, W. Chiu, and M. Sun, “Plug-and-play: Improve depth prediction via sparse data propagation,” in 2019 International Conference on Robotics and Automation (ICRA), pp. 5880–5886, May 2019. v
[2] T.-H. Wang, H.-N. Hu, C. H. Lin, Y.-H. Tsai, W.-C. Chiu, and M. Sun, “3d li- dar and stereo fusion using stereo matching network with conditional cost volume normalization,” arXiv preprint arXiv:1904.02917, 2019. v
[3] F.MaandS.Karaman,“Sparse-to-dense:Depthpredictionfromsparsedepthsam- ples and a single image,” in International Conference on Robotics and Automation (ICRA), 2018. xv, 1, 9, 19, 21, 23, 30, 31
[4] J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in International Conference on 3D Vision (3DV), 2017. 1, 4, 19, 20, 21, 23, 30
[5] M. Jaritz, R. De Charette, E. Wirbel, X. Perrotton, and F. Nashashibi, “Sparse and dense data with cnns: Depth completion and semantic segmentation,” arXiv preprint arXiv:1808.00769, 2018. 2, 9, 10, 21, 22, 23, 30
[6] Z. Chen, V. Badrinarayanan, G. Drozdov, and A. Rabinovich, “Estimating depth from rgb and sparse sensing,” arXiv preprint arXiv:1804.02771, 2018. 2
[7] N.Chodosh,C.Wang,andS.Lucey,“Deepconvolutionalcompressedsensingfor lidar depth completion,” arXiv preprint arXiv:1803.08949, 2018. 2, 9, 21, 23, 27, 30
[8] R.Y.XinjingCheng,PengWang,“Depthestimationviaaffinitylearnedwithcon- volutional spatial propagation network,” in European Conference on Computer Vision (ECCV), 2018. 2, 7, 9, 14
[9] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, “Understanding neural networks through deep visualization,” in Deep Learning Workshop, International Conference on Machine Learning (ICML), 2015. 2
[10] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” arXiv preprint arXiv:1607.02533, 2016. 2, 11, 12
[11] P. K. Nathan Silberman, Derek Hoiem and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in European Conference on Computer Vision (ECCV), 2012. 3, 19, 20
[12] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2012. 3
[13] J. Zbontar and Y. LeCun, “Computing the stereo matching cost with a convo- lutional neural network,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 4, 6, 24
[14] W.Luo,A.G.Schwing,andR.Urtasun,“Efficientdeeplearningforstereomatch- ing,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 4, 6
[15] A.Kendall,H.Martirosyan,S.Dasgupta,P.Henry,R.Kennedy,A.Bachrach,and A. Bry, “End-to-end learning of geometry and context for deep stereo regression,” in IEEE International Conference on Computer Vision (ICCV), 2017. 4, 6, 14, 22, 24, 25, 27, 29
[16] J.Pang,W.Sun,J.S.Ren,C.Yang,andQ.Yan,“Cascaderesiduallearning:Atwo- stage convolutional neural network for stereo matching,” in IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2017. 4, 6
[17] J.-R. Chang and Y.-S. Chen, “Pyramid stereo matching network,” in IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR), 2018. 4, 6
[18] P.-H. Huang, K. Matzen, J. Kopf, N. Ahuja, and J.-B. Huang, “Deepmvs: Learn- ing multi-view stereopsis,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 4
[19] M. Menze and A. Geiger, “Object scene flow for autonomous vehicles,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 4, 19, 20
[20] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Advances in neural information processing systems, pp. 2366–2374, 2014. 5, 19, 21, 23
[21] D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658, 2015. 5
[22] F.Liu,C.Shen,andG.Lin,“Deepconvolutionalneuralfieldsfordepthestimation from a single image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5162–5170, 2015. 5
[23] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,” in 3D Vision (3DV), 2016 Fourth International Conference on, pp. 239–248, IEEE, 2016. 5, 10, 19, 21, 23
[24] H.Fu,M.Gong,C.Wang,K.Batmanghelich,andD.Tao,“Deepordinalregression network for monocular depth estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011, 2018. 5, 10
[25] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision (IJCV), 2002. 6, 14
[26] H.Hirschmuller,“Stereoprocessingbysemiglobalmatchingandmutualinforma- tion,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2008. 6, 7, 24, 25
[27] N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox, “A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 6
[28] F. Mal and S. Karaman, “Sparse-to-dense: Depth prediction from sparse depth samples and a single image,” in IEEE International Conference on Robotics and Automation (ICRA), 2018. 6, 15, 28, 29
[29] F. Ma, G. V. Cavalheiro, and S. Karaman, “Self-supervised sparse-to- dense: Self-supervised depth completion from lidar and monocular camera,” ArXiv:1807.00275, 2018. 6, 24, 25, 28, 29
[30] J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in International Conference on 3D Vision (3DV), 2017. 6
[31] Z. Huang, J. Fan, S. Yi, X. Wang, and H. Li, “Hms-net: Hierarchical multi-scale sparsity-invariant network for sparse depth completion,” ArXiv:1808.08685, 2018. 6
[32] A.Eldesokey,M.Felsberg,andF.S.Khan,“Confidencepropagationthroughcnns for guided sparse depth regression,” ArXiv:1811.01791, 2018. 6, 24, 25
[33] W. Van Gansbeke, D. Neven, B. De Brabandere, and L. Van Gool, “Sparse and noisy lidar completion with rgb guidance and uncertainty,” ArXiv:1902.05356, 2019. 6, 24, 25
[34] X. Cheng, P. Wang, and R. Yang, “Depth estimation via affinity learned with con- volutional spatial propagation network,” in European Conference on Computer Vision (ECCV), 2018. 6
[35] T.-H.Wang,F.-E.Wang,J.-T.Lin,Y.-H.Tsai,W.-C.Chiu,andM.Sun,“Plug-and- play: Improve depth estimation via sparse data propagation,” AarXiv:1812.08350, 2018. 6
[36] K.Nickels,A.Castano,andC.Cianci,“Fusionoflidarandstereorangeformobile robots,” in International Conference on Advanced Robotics (ICAR), 2003. 7
[37] D. Huber, T. Kanade, et al., “Integrating lidar into stereo for fast and improved disparity computation,” in International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), 2011. 7
[38] V. Gandhi, J. Čech, and R. Horaud, “High-resolution depth maps based on tof- stereo fusion,” in IEEE International Conference on Robotics and Automation (ICRA), 2012. 7
[39] W. Maddern and P. Newman, “Real-time probabilistic fusion of sparse 3d lidar and dense stereo,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016. 7, 20, 24, 25
[40] K. Park, S. Kim, and K. Sohn, “High-precision depth estimation with the 3d lidar and stereo fusion,” in IEEE International Conference on Robotics and Automation (ICRA), 2018. 7, 20, 24, 25
[41] E. Perez, H. De Vries, F. Strub, V. Dumoulin, and A. Courville, “Learning visual reasoning without strong priors,” ArXiv:1707.03017, 2017. 8, 15, 16
[42] H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, and A. C. Courville, “Modulating early visual processing by language,” in Advances in Neural Infor- mation Processing Systems (NIPS), 2017. 8, 15, 16
[43] E. Perez, F. Strub, H. De Vries, V. Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” in AAAI Conference on Artificial Intelligence (AAAI), 2018. 8
[44] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” ArXiv:1511.06434, 2015. 8
[45] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computa- tion, 1997. 8
[46] D. Ha, A. Dai, and Q. V. Le, “Hypernetworks,” ArXiv:1609.09106, 2016. 8
[47] C.H.Lin,C.-C.Chang,Y.-S.Chen,D.-C.Juan,W.Wei,andH.-T.Chen,“COCO-
GAN: Conditional coordinate generative adversarial network,” 2019. 8
[48] Y.Liao,L.Huang,Y.Wang,S.Kodagoda,Y.Yu,andY.Liu,“Parsegeometryfrom a line: Monocular depth estimation with partial laser observation,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp. 5059–5066, IEEE, 2017. 9, 10
[49] Y. Zhang and T. Funkhouser, “Deep depth completion of a single rgb-d image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 9
[50] Y. Cao, Z. Wu, and C. Shen, “Estimating depth from monocular images as classi- fication using deep fully convolutional residual networks,” IEEE Transactions on Circuits and Systems for Video Technology, 2017. 10
[51] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal represen- tations by error propagation,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations, 1986. 12
[52] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Inter- national Conference for Learning Representations, 2014. 12
[53] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recogni- tion,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 17, 28
[54] T.Zhou,M.Brown,N.Snavely,andD.G.Lowe,“Unsupervisedlearningofdepth and ego-motion from video,” in CVPR, 2017. 21, 23
[55] G.Hinton,N.Srivastava,andK.Swersky,“Neuralnetworksformachinelearning lecture 6a overview of mini-batch gradient descent.” 22
[56] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. 31
(此全文未開放授權)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *