帳號:guest(18.226.226.121)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):史芳瑜
作者(外文):Shih, Fang-Yu
論文名稱(中文):利用空間擴增改良深度學習相機定位方法
論文名稱(外文):Improving the Accuracy of Deep Localization Models by Spatially-augmented Camera Poses
指導教授(中文):賴尚宏
指導教授(外文):Lai, Shang-Hong
口試委員(中文):許秋婷
鄭嘉珉
林嘉文
口試委員(外文):Hsu, Chiou-Ting
Cheng, Chia-Ming
Lin, Chia-Wen
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:106062538
出版年(民國):109
畢業學年度:109
語文別:英文
論文頁數:34
中文關鍵詞:相機定位資料擴增深度學習
外文關鍵詞:camera localizationdata augmentationdeep learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:279
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
深度定位是採用深度學習來解決相機定位問題的一種新方法。它分為基於結構的方法和基於圖像的方法兩類。基於結構的方法按照傳統的程序來解決定位問題,但在一些部件中利用了深度學習技術,通常可以得到更精確的結果,但需要使用更多的計算資源。基於圖像的方法訓練了一個CNN網絡,該網絡直接從輸入的圖像中回歸相機姿勢。它通常計算效率高,但定位精度較低。

在本篇論文中,我們提出了一種針對基於圖像的定位的數據擴增方法,以提高定位的精度。我們推測基於圖像的方法精度較低的原因是由於訓練數據的密度有限。我們的方法通過擴增更多的相機姿態來增加訓練數據的多樣性來緩解這個問題。訓練圖像根據擴增後的姿態,以現有圖像為參考進行圖像變形合成。我們的方法可以應用於不同的基於圖像的方法。實驗結果表明,通過在一些公共數據集上的實驗,我們的擴增方法提高了現有基於圖像的深度定位方法的精度
Deep localization is a new way to solve the camera localization problem by employing deep learning. It is divided into two categories, the structure-based approach and the image-based approach. Structure-based methods follow the traditional procedure to solve the localization problem, but utilize deep learning technique in some components, usually leading to more accurate results but requiring more computational resources. Image-based methods train a CNN network that directly regresses camera poses from the input images. It is usually computationally efficient, but the localization is less accurate.

In this paper, we propose a data augmentation method for the image-based localization approach to improve the localization accuracy. We conjecture the reason to the lower accuracy of the image-based method to be due to the limited density of the training data. Our method alleviates this problem by augmenting more camera poses to increase the diversity of the training data. The training images according to the augmented poses are synthesized by taking existing images as the reference for image warping. Our method could be applied to different image-based methods. The experiment results shows that our augmentation method improves the accuracy of the existing image based deep localization methods through experiments on some public datasets.
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Related Work 6
2.1 Structure-based Localization . . . . . . . . . . . . . . . . . . . . . 6
2.2 Image-based Localization . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Absolute Pose Regression (APR) . . . . . . . . . . . . . . 7
2.2.2 Related Pose Regression (RPR) . . . . . . . . . . . . . . . 8
2.3 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Proposed Method 10
3.1 Camera Pose Augmentation . . . . . . . . . . . . . . . . . . . . . 10
3.2 Image Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Invalid Pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Experiments 15
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.1 7-Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.2 12-Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Implementation and Experiment Settings . . . . . . . . . . . . . . . 16
4.2.1 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . 16
4.2.2 Image Synthesizing . . . . . . . . . . . . . . . . . . . . . . 16
4.2.3 MapNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.4 RelativePN . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Experimental Comparison . . . . . . . . . . . . . . . . . . . . . . 17
4.3.1 Compared with MapNet . . . . . . . . . . . . . . . . . . . 17
4.3.2 RelativePN . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.5 Different Augmented Method . . . . . . . . . . . . . . . . . . . . . 23
4.6 K-nearest-neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.7 Augmented number . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.8 Sampling method . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.9 Depth Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Conclusions 30
References 31
[1] Balntas, V., Li, S., and Prisacariu, V. Relocnet: Continuous metric learning
relocalisation using neural nets. In Proceedings of the European Conference
on Computer Vision (ECCV) (2018), pp. 751–767.
[2] Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S.,
and Rother, C. Dsac-differentiable ransac for camera localization. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition
(2017), pp. 6684–6692.
[3] Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., et al.
Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb
image. In Proceedings of the IEEE conference on computer vision and pattern
recognition (2016), pp. 3364–3372.
[4] Brachmann, E., and Rother, C. Learning less is more-6d camera localization
via 3d surface regression. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (2018), pp. 4654–4662.
[5] Brahmbhatt, S., Gu, J., Kim, K., Hays, J., and Kautz, J. Geometry-aware learning
of maps for camera localization. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (2018), pp. 2616–2625.
[6] Bui, M., Baur, C., Navab, N., Ilic, S., and Albarqouni, S. Adversarial networks
for camera pose regression and refinement. In Proceedings of the IEEE
International Conference on Computer Vision Workshops (2019), pp. 0–0.
[7] Cai, M., Shen, C., and Reid, I. A hybrid probabilistic model for camera relocalization.
[8] Cavallari, T., Golodetz, S., Lord, N. A., Valentin, J., Di Stefano, L., and Torr,
P. H. On-the-fly adaptation of regression forests for online camera relocalisation.
In Proceedings of the IEEE conference on computer vision and pattern
recognition (2017), pp. 4457–4466.
[9] Chum, O., and Matas, J. Optimal randomized ransac. IEEE Transactions on
Pattern Analysis and Machine Intelligence 30, 8 (2008), 1472–1482.
[10] Clark, R., Wang, S., Markham, A., Trigoni, N., and Wen, H. Vidloc: A deep
spatio-temporal model for 6-dof video-clip relocalization. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (2017),
pp. 6856–6864.
[11] Donoser, M., and Schmalstieg, D. Discriminative featureto-point matching in
image-based locallization. In In CVPR (2014), Citeseer.
[12] Guzman-Rivera, A., Kohli, P., Glocker, B., Shotton, J., Sharp, T., Fitzgibbon,
A., and Izadi, S. Multi-output learning for camera relocalization. In Proceedings
of the IEEE conference on computer vision and pattern recognition
(2014), pp. 1114–1121.
[13] He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision and
pattern recognition (2016), pp. 770–778.
[14] Hochreiter, S., and Schmidhuber, J. Long short-term memory. Neural computation
9, 8 (1997), 1735–1780.
[15] Kendall, A., and Cipolla, R. Modelling uncertainty in deep learning for camera
relocalization. In 2016 IEEE international conference on Robotics and
Automation (ICRA) (2016), IEEE, pp. 4762–4769.
[16] Kendall, A., and Cipolla, R. Geometric loss functions for camera pose regression
with deep learning. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (2017), pp. 5974–5983.
[17] Kendall, A., Grimes, M., and Cipolla, R. Posenet: A convolutional network
for real-time 6-dof camera relocalization. In Proceedings of the IEEE international
conference on computer vision (2015), pp. 2938–2946.
[18] Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980 (2014).
[19] Kneip, L., Scaramuzza, D., and Siegwart, R. A novel parametrization of the
perspective-three-point problem for a direct computation of absolute camera
position and orientation. In CVPR 2011 (2011), IEEE, pp. 2969–2976.
[20] Laskar, Z., Melekhov, I., Kalia, S., and Kannala, J. Camera relocalization by
computing pairwise relative poses using convolutional neural network. In Proceedings
of the IEEE International Conference on Computer Vision Workshops
(2017), pp. 929–938.
[21] Li, Y., Snavely, N., Huttenlocher, D., and Fua, P. Worldwide pose estimation
using 3d point clouds. In European conference on computer vision (2012),
Springer, pp. 15–29.
[22] Liu, L., Li, H., and Dai, Y. Efficient global 2d-3d matching for camera localization
in a large-scale 3d map. In Proceedings of the IEEE International
Conference on Computer Vision (2017), pp. 2372–2381.
[23] Massiceti, D., Krull, A., Brachmann, E., Rother, C., and Torr, P. H. Random
forests versus neural networks􀅼what’s best for camera localization? In 2017
IEEE International Conference on Robotics and Automation (ICRA) (2017),
IEEE, pp. 5118–5125.
[24] Melekhov, I., Ylioinas, J., Kannala, J., and Rahtu, E. Image-based localization
using hourglass networks. In Proceedings of the IEEE International Conference
on Computer Vision Workshops (2017), pp. 879–886.
[25] Meng, L., Chen, J., Tung, F., Little, J. J., Valentin, J., and de Silva, C. W.
Backtracking regression forests for accurate camera relocalization. In 2017
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
(2017), IEEE, pp. 6886–6893.
[26] Meng, L., Tung, F., Little, J. J., Valentin, J., and de Silva, C. W. Exploiting
points and lines in regression forests for rgb-d camera relocalization. In 2018
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
(2018), IEEE, pp. 6827–6834.
[27] Naseer, T., and Burgard, W. Deep regression for monocular camera-based
6-dof global localization in outdoor environments. In 2017 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS) (2017), IEEE,
pp. 1525–1530.
[28] Nathan Silberman, Derek Hoiem, P. K., and Fergus, R. Indoor segmentation
and support inference from rgbd images. In ECCV (2012).
[29] Radwan, N., Valada, A., and Burgard, W. Vlocnet++: Deep multitask learning
for semantic visual localization and odometry. IEEE Robotics and Automation
Letters 3, 4 (2018), 4407–4414.
[30] Saha, S., Varma, G., and Jawahar, C. Improved visual relocalization by discovering
anchor points. arXiv preprint arXiv:1811.04370 (2018).
[31] Sattler, T., Leibe, B., and Kobbelt, L. Efficient & effective prioritized matching
for large-scale image-based localization. IEEE transactions on pattern
analysis and machine intelligence 39, 9 (2016), 1744–1756.
[32] Sattler, T., Zhou, Q., Pollefeys, M., and Leal-Taixe, L. Understanding the
limitations of cnn-based absolute camera pose regression. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (2019),
pp. 3302–3312.
[33] Schönberger, J. L., Pollefeys, M., Geiger, A., and Sattler, T. Semantic visual
localization. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (2018), pp. 6896–6906.
[34] Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., and Fitzgibbon,
A. Scene coordinate regression forests for camera relocalization in rgb-d images.
In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (2013), pp. 2930–2937.
[35] Simonyan, K., and Zisserman, A. Very deep convolutional networks for largescale
image recognition. arXiv preprint arXiv:1409.1556 (2014).
[36] Svärm, L., Enqvist, O., Kahl, F., and Oskarsson, M. City-scale localization for
cameras with known vertical direction. IEEE transactions on pattern analysis
and machine intelligence 39, 7 (2016), 1455–1461.
[37] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,
Vanhoucke, V., and Rabinovich, A. Going deeper with convolutions. In Proceedings
of the IEEE conference on computer vision and pattern recognition
(2015), pp. 1–9.
[38] Taira, H., Okutomi, M., Sattler, T., Cimpoi, M., Pollefeys, M., Sivic, J., Pajdla,
T., and Torii, A. Inloc: Indoor visual localization with dense matching and
view synthesis. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (2018), pp. 7199–7209.
[39] Taira, H., Rocco, I., Sedlar, J., Okutomi, M., Sivic, J., Pajdla, T., Sattler, T.,
and Torii, A. Is this the right place? geometric-semantic pose verification for
indoor visual localization. In Proceedings of the IEEE International Conference
on Computer Vision (2019), pp. 4373–4383.
[40] Valada, A., Radwan, N., and Burgard, W. Deep auxiliary learning for visual
localization and odometry. In 2018 IEEE international conference on robotics
and automation (ICRA) (2018), IEEE, pp. 6939–6946.
[41] Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., and Cremers,
D. Image-based localization using lstms for structured feature correlation.
In Proceedings of the IEEE International Conference on Computer Vision
(2017), pp. 627–637.
[42] Wang, B., Chen, C., Lu, C. X., Zhao, P., Trigoni, N., and Markham, A. Atloc:
Attention guided camera localization. In Proceedings of the AAAI Conference
on Artificial Intelligence (2020), vol. 34, pp. 10393–10401.
[43] Wang, X., Wang, X., Wang, C., Bai, X., Wu, J., and Hancock, E. R. Discriminative
features matter: Multi-layer bilinear pooling for camera localization. In
British Machine Vision Conference (2019), York.
[44] Wu, J., Ma, L., and Hu, X. Delving deeper into convolutional neural networks
for camera relocalization. In 2017 IEEE International Conference on Robotics
and Automation (ICRA) (2017), IEEE, pp. 5644–5651.
[45] Xue, F., Wang, X., Yan, Z., Wang, Q., Wang, J., and Zha, H. Local supports
global: Deep camera relocalization with sequence enhancement. In Proceedings
of the IEEE International Conference on Computer Vision (2019),
pp. 2841–2850.
[46] Xue, F., Wu, X., Cai, S., and Wang, J. Learning multi-view camera relocalization
with graph neural networks. In 2020 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR) (2020), IEEE, pp. 11372–11381.
[47] Yin, W., Liu, Y., Shen, C., and Yan, Y. Enforcing geometric constraints of
virtual normal for depth prediction. In Proceedings of the IEEE International
Conference on Computer Vision (2019), pp. 5684–5693.
[48] Zeisl, B., Sattler, T., and Pollefeys, M. Camera pose voting for large-scale
image-based localization. In Proceedings of the IEEE International Conference
on Computer Vision (2015), pp. 2704–2712.
[49] Zhong, Z., Zheng, L., Kang, G., Li, S., and Yang, Y. Random erasing data
augmentation. In AAAI (2020), pp. 13001–13008.
[50] Zhou, Q.-Y., Park, J., and Koltun, V. Open3D: A modern library for 3D data
processing. arXiv:1801.09847 (2018).
(此全文未開放授權)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *