帳號:guest(18.116.10.127)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):李佩容
作者(外文):Li, Pei-Rong
論文名稱(中文):解決非理想效應並應用於實際光達系統之深度圖超解析
論文名稱(外文):Depth Map Super-Resolution against Non-Ideal Effects for Practical LiDAR System
指導教授(中文):黃朝宗
指導教授(外文):Huang, Chao-Tsung
口試委員(中文):邱瀞德
王家慶
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:108061559
出版年(民國):110
畢業學年度:110
語文別:英文
論文頁數:63
中文關鍵詞:深度圖超解析非理想效應光達
外文關鍵詞:Depth map super resolutionNon-ideal effectLiDAR
相關次數:
  • 推薦推薦:0
  • 點閱點閱:478
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
對於擴增實境和虛擬實境應用,重要的目標是建立具有準確深度值和高空間解析度的三維模型。使用三維混沌光達系統進行掃描,可以獲得在深度方向上有高準確度、但在水平以及垂直方向上僅有較低解析度之點雲資訊。因此我們採用基於深度學習的深度圖超解析技術,透過彩色影像引導來推論額外的深度資訊。將技術與實際設備結合時會出現非理想效應,例如在光達深度數據採集的限制、感測器間的跨模態差異、以及設備的噪聲干擾,因此我們的目標是設計解決非理想效應並應用於實際光達系統之深度圖超解析。

首先,光達深度圖的成像特性不同於常見的標竿數據集之深度圖,並且收集大量光達資料非常耗時,這些因素導致足夠訓練數據集的缺乏。因此我們採用遷移學習並創建了一個有 508 組深度與對應彩色影像之合成光達模擬數據集,名為 ROOMv1,以獲得足夠的先驗知識。此外我們設計了一個組合式區塊遷移策略,來選擇性地轉換卷積神經區塊:包含具有更高相似性的 ROOMv1 預訓練知識,以及能提供更佳真實性的 NYUv2 預訓練知識。

其次,由於光達和數位相機傳感器的跨模態特性,獲取之深度圖和彩色圖像會不一致,並導致投影成像的錯位。 因此我們提出了點雲光柵演算法,將深度從三維點雲轉換至圖像坐標系,使二維深度圖與彩色影像對齊良好並且不會產生深度缺失。另外也透過最接近採樣法保留了光達系統高準確度的優點。

最後,雷射光接收端的低頻噪聲干擾所引起的極值測距誤差,經過深度數據的處理,在深度圖上的相關位置上出現空缺。利用我們設計的自適應加權空洞填補演算法,可以消除深度空缺並提高模型訓練結果的品質。

對比不使用預訓練的模型,使用 ROOMv1 資料集預訓練,並使用真實光達數據集進行微調的四倍超解析模型,可以將推論結果中的目標物均方根誤差平均降低 0.205 公分。ROOMv1 預訓練模型在經過點雲光柵演算法處理後,其平均均方根誤差比原始模型可減少 22.29\%。同時,利用這個品質更好的預訓練模型進行微調後,目標的平均均方根誤差可以進一步降低 0.178 公分、在物體邊緣處則可以降低達 0.493 公分;接著透過自適應加權空洞填補演算法,可使均方根誤差平均減少 0.220 公分。綜上所述,針對非理想效應所設計的深度超解析,對於實際的三維混沌光達系統,最終可提供超解析達四倍的深度圖,整體均方根誤差平均為 4.098 公分,紋理均方根誤差平均為 2.980 公分。
For augmented and virtual reality applications, the important goal is to establish 3D models which have accurate depth values with high spatial resolution. Scanning with the 3D chaos LiDAR system, we can acquire point clouds with high accuracy in the Z-direction but low resolutions in the X- and Y-directions. Therefore, with RGB images guidance, we adopt deep learning-based depth map super-resolution to infer the additional number of depth points. As combining the technology with an actual device, some non-ideal effects occur, such as the restrictions on LiDAR data collection, cross-modality between sensors, and noise interference on the device. Based on the issues, it is our objective to design depth map super-resolution against these problems for the practical LiDAR system.

To begin with, the imaging characteristic of the LiDAR depth maps is distinct from the common benchmarks. Collecting a large amount of the data in practice is also time-consuming. The factors lead to the lack of sufficient training data. As the result, we adopt transfer learning and created a 508-pairs simulated synthetic indoor dataset, ROOMv1, for adequate prior knowledge. Besides, we design the combined block transfer strategy to selectively transfer the CNN blocks which contain the pre-trained knowledge of ROOMv1 with higher physical similarity and NYUv2 with better authenticity, respectively.

Secondly, due to the cross-modality of sensors in the LiDAR and the digital camera, depth maps and guidance images exhibit inconsistency and cause projection misalignment. As the result, we propose the point cloud rastering algorithm to convert the depths from the 3D to the image coordinate systems and make the RGB-D images aligned well without generating missing points. In addition, it can retain the advantages of LiDAR's high accuracy by nearest sampling.

At last, there are ranging errors of extreme values caused by the noise interference from the laser receiver. After the processing of data, vacancies would occur at related locations on depth maps. The designed adaptive weighted hole-filling algorithm can remove the noise-like vacancies and boost the quality of results.

The SRx4 model which is pre-trained with ROOMv1 and fine-tuned with the real LiDAR data can reduce the average RMSE at targets by 0.205 cm compared to the model trained from scratch. By processing with the point cloud rastering, the average RMSE of the ROOMv1 pre-trained model surpasses the original one by 22.29\%. As the consequence, for the model transferred and fine-tuned with the promoted pre-trained model, the RMSE can be further reduced by 0.178 cm for the targets and by 0.493 cm for the edges. Continuously, by optimizing the data by adaptive weighted hole-filling, the RMSE has an average of 0.220 cm reduction. In conclusion, with the proposed depth super-resolution against non-ideal effects, the final SRx4 result with an average RMSE of 4.098 cm for the whole and 2.980 cm for the texture is accomplished for the practical 3D chaos LiDAR system.
摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . iii
誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contents . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . ix
List of Tables . . . . . . . . . . . . . . . . . . . . . . . xi

Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Deep Learning-Based Depth Map Super-Resolution . . . . . . 3
1.2.2 Depth Map Super-Resolution for Practical Systems . . . . . 4
1.2.3 Transfer Learning . . . . . . . . . . . . . . . . . . . . 4
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . 6

Chapter 2 Non-Ideal Effects in Practical LiDAR Application . . . . 7
2.1 Restrictions on LiDAR Data Collection . . . . . . . . . . . 7
2.2 Cross-Modality between LiDAR and CMOS sensor . . . . . . . . 9
2.3 Noise Interference from Laser Beam Receiver . . . . . . . . 10

Chapter 3 Proposed Methods . . . . . . . . . . . . . . . . . . 13
3.1 Depth Map Super-Resolution System Overview . . . . . . . . . 13
3.1.1 System and Working Flow . . . . . . . . . . . . . . . . . 13
3.1.2 Depth Map Super-Resolution Neural Network . . . . . . . . 15
3.2 Practical LiDAR Transfer Learning . . . . . . . . . . . . . 16
3.2.1 Comparison with Other RGB-D Datasets . . . . . . . . . . . 16
3.2.2 Synthetic Dataset Generation–ROOMv1, PLANTS . . . . . . . 17
3.2.3 Transfer Learning for Practical LiDAR Application . . . . 20
3.3 Point Cloud Rastering . . . . . . . . . . . . . . . . . . . 25
3.3.1 Image Warping . . . . . . . . . . . . . . . . . . . . . . 25
3.3.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Adaptive Weighted Hole-Filling . . . . . . . . . . . . . . . 32

Chapter 4 Experimental Results . . . . . . . . . . . . . . . . 35
4.1 Experimental Settings . . . . . . . . . . . . . . . . . . . 35
4.1.1 Settings for Precise Evaluation . . . . . . . . . . . . . 35
4.1.2 Leave-One-Out-Cross-Validation . . . . . . . . . . . . . 37
4.2 ROOMv1 Test . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 PLANTS Test . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Real LiDAR Test . . . . . . . . . . . . . . . . . . . . . . 41
4.4.1 Transfer Learning for Practical LiDAR . . . . . . . . . . 42
4.4.2 Point Cloud Rastering for RGB-D Data . . . . . . . . . . 43
4.4.3 Hole-Filling for Real LiDAR Data . . . . . . . . . . . . 44
4.4.4 Pre-trained Model Discussion . . . . . . . . . . . . . . 45
4.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5 Failure Cases . . . . . . . . . . . . . . . . . . . . . . . 48
4.5.1 Blurry Guidance Image . . . . . . . . . . . . . . . . . . 48
4.5.2 Over-Texture Depth Map . . . . . . . . . . . . . . . . . 49
4.6 Qualitative Results . . . . . . . . . . . . . . . . . . . . 51

Chapter 5 Conclusion and Future Work . . . . . . . . . . . . . 59
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . 60
[1] Fan-Yi Lin and Jia-Ming Liu, “Chaotic lidar,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 10, no. 5, pp. 991–997, 2004.
[2] Chih-Hao Cheng, Chih-Ying Chen, Jun-Da Chen, Da-Kung Pan, Kai-Ting
Ting, and Fan-Yi Lin, “3D pulsed chaos lidar system,” Opt. Express, vol. 26, no. 9, pp. 12230–12241, Apr 2018.
[3] Jun-Da Chen, Hsin-Lin Ho, Han-Ling Tsay, You-Lin Lee, Ching-An Yang, Kuan-Wei Wu, Jia-Long Sun, Da-Jie Tsai, and Fan-Yi Lin, “3D chaos lidar system with a pulsed master oscillator power amplifier scheme,” Opt. Express, vol. 29, no. 17, pp. 27871–27881, Aug 2021.
[4] Tak-Wai Hui, Chen Change Loy, and Xiaoou Tang, “Depth map super resolution by deep multi-scale guidance,” in Proceedings of European Conference on Computer Vision (ECCV), 2016, pp. 353–369.
[5] Xibin Song, Yuchao Dai, Dingfu Zhou, Liu Liu, Wei Li, Hongdong Li, and Ruigang Yang, “Channel attention based iterative residual learning for depth map super-resolution,” in 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5630–5639.
[6] Gernot Riegler, Matthias Rüther, and Bischof Horst, “ATGV-Net: Accurate depth super-resolution,” in Proceedings of European Conference on Computer Vision (ECCV), 2016.
[7] Baoliang Chen and Cheolkon Jung, “Single depth image super-resolution using convolutional neural networks,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 1473–1477.
[8] Xinchen Ye, Xiangyue Duan, and Haojie Li, “Depth super-resolution with deep edge-inference network and edge-guided depth filling,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 1398–1402.
[9] Wentian Zhou, Xin Li, and Daryl Reynolds, “Guided deep network for depth map super-resolution: How much can color help?,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 1457–1461.
[10] Yijun Li, Jia-Bin Huang, Ahuja Narendra, and Ming-Hsuan Yang, “Deep joint image filtering,” in Proceedings of European Conference on Computer Vision (ECCV), 2016.
[11] Anh Minh Truong, Peter Veelaert, and Wilfried Philips, “Depth map inpainting and super-resolution with arbitrary scale factors,” in 2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 488–492.
[12] Miguel Heredia Conde, “Raw data processing for practical time-of-flight super-resolution,” in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 1650–1654.
[13] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson, “How transferable are features in deep neural networks?,” in Advances in Neural Information Processing Systems (NIPS). 2014, vol. 27, Curran Associates, Inc.
[14] Pushmeet Kohli Nathan Silberman, Derek Hoiem and Rob Fergus, “Indoor segmentation and support inference from rgbd images,” in Proceedings of European Conference on Computer Vision (ECCV), 2012.
[15] Daniel Scharstein and Chris Pal, “Learning conditional random fields for stereo,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007, pp. 1–8.
[16] David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias Ruether, and Horst Bischof, “Image guided depth upsampling using anisotropic total generalized variation,” in 2013 IEEE International Conference on Computer Vision (ICCV), 2013, pp. 993–1000.
[17] J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant CNNs,” in 2017 International Conference on 3D Vision (3DV), Los Alamitos, CA, USA, oct 2017, pp. 11–20, IEEE Computer Society.
[18] Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun, “Colored point cloud registration revisited,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
[19] Blender Online Community, Blender - a 3D modelling and rendering package, Blender Foundation, Stichting Blender Foundation, Amsterdam, 2018, Available at http://www.blender.org.
[20] Michael Gschwandtner, Roland Kwitt, Andreas Uhl, and Wolfgang Pree, “Blensor: Blender sensor simulation toolbox,” in Advances in Visual Computing, Berlin, Heidelberg, 2011, pp. 199–208, Springer Berlin Heidelberg.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *