帳號:guest(18.218.163.82)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):張洛鳴
作者(外文):Chang, Lo-Ming
論文名稱(中文):基於深度學習之輕量級特徵提取系統
論文名稱(外文):A Deep-Learning-Based Lightweight Feature Extraction System
指導教授(中文):馬席彬
指導教授(外文):Ma, Hsi-Pin
口試委員(中文):朱宏國
胡敏君
鄭經華
口試委員(外文):Chu, Hung-Kuo
Hu, Min-Chun
Cheng, Ching-Hwa
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:109061551
出版年(民國):111
畢業學年度:111
語文別:中文
論文頁數:76
中文關鍵詞:特徵提取深度學習圖像匹配
外文關鍵詞:Feature-ExtractionDeep-LearningImage-Matching
相關次數:
  • 推薦推薦:0
  • 點閱點閱:912
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
目前已有許多人使用深度學習技術來實現特徵提取系統,並透過設計複雜的卷積神經網絡或演算法來進行最佳化,卻可能因為系統功能模塊運算量過高,需要花費數百至數千毫秒的執行時間,降低了系統達到即時處理的可行度。在本論文中,以可重複且可靠之檢測器與描述器 (Repeatable and Reliable Detector and Descriptor,以下簡稱 R2D2) 做為基礎,分為卷積神經網絡和推論階段演算法的兩個面相來提升執行速度。在卷積神經網絡方面,設計各種不同空洞卷積與卷積層層數的組合實驗,分析歸納出空洞卷積的使用時機與空洞率的設計建議,並藉此減少整個架構的卷積層層數以及參數量,初步地將卷積神經網路的運算量降到標準 R2D2 的 65.5%。在推論階段方面,透過減少多尺度圖像金字塔高度的化簡方法,進一步將整體系統的運算量降低到標準 R2D2 的 18.4%。匹配效能方面,標準 R2D2 的 MMA@3 與 M-Score 分別為 68.6% 與 44.52%,而本研究所提出的系統之 MMA@3 與 M-Score 分別為 66.41% 與 40.84%,整體雖然衰退,但仍保持在 R2D2 的 90% 以上。在實際的執行速度方面,以一張 (640×480) 的 RGB 輸入圖像為例,在 NVIDIA Geforce RTX™ 3060 Ti GPU 的測試條件下,標準 R2D2 系統所需執行時間落在 149.53 毫秒,而本研究最終所提出的輕量級深度特徵提取系統只需要 39.41 毫秒。
At present, many people have used Deep-Learning techniques to implement feature extraction systems, which are optimized by designing complex Convolutional Neural Networks(CNN) or algorithms. Computationally expensive modules may take hundreds of milliseconds, reducing the feasibility of the system to achieve real-time processing. In this thesis, we choose Repeatable and Reliable Detector and Descriptor(R2D2) as basis. It is divided into two aspects of CNN and inference stage algorithm to improve the speed of system. In the experiments of CNN, we designed various combination of dilated convolution operation and depth of CNN, and summarized the design suggestions for the use of dilated convolution and dilated rate. Initially, the FLOPs of the CNN is reduced to 65.5% of the standard R2D2. In the experiments of inference stage, we reduced the height of the multi-scale image pyramid. The FLOPs of the system proposed is further reduced to 18.4% of the standard R2D2. In terms of accuracy, the MMA@3 and M-Score of the standard R2D2 are 68.6% and 44.52% respectively, while the MMA@3 and MScore of the system proposed in this study are 66.41% and 40.84%. Although the overall accuracy declines, it still remains above 90% of R2D2.Under the testing of NVIDIA Geforce RTX™ 3060 Ti GPU,the execution time of standard R2D2 system falls in 149.53 ms, while the lightweight deep feature extraction system proposed in this study only needs 39.41 ms.
誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V
第一 章 緒論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 研究背景 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 研究動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 主要貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 論文大綱 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
第二 章 文獻回顧 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 手工特徵提取演算法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 尺度不變特徵轉換 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 加速穩健特徵 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 ORB 特徵提取 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 手工演算法比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 基於學習之特徵提取演算法 . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 基於學習之檢測器 . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 基於學習之描述器 . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 特徵提取與描述聯合方法 . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 可重複且可靠之檢測器與描述器 . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 可重複度與可靠度比較 . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 深度特徵提取演算法比較 . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 卷積運算 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 深度可分離卷積 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.2 空洞卷積 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 文獻分析 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.1 匹配效能與執行速度比較 . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.2 卷積神經網絡架構討論 . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5.3 多尺度提取討論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
第三 章 特徵提取系統架構 . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1 系統架構與預計規格 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 深度特徵提取器 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 卷積神經網絡 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.2 損失函數 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 推論階段 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.1 多尺度特徵提取 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.2 特徵點篩選 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
第四 章 實驗設計與結果分析 . . . . . . . . . . . . . . . . . . . . . . . 37
4.1 實驗環境 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 軟硬體環境 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.2 訓練資料 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.3 訓練參數 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 評測效能指標 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.1 HPatches 數據集 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.2 匹配評分 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.3 Oxford 數據集 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 深度可分離卷積實驗 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 空洞卷積實驗 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.1 卷積層層數調整 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.2 空洞卷積 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4.3 卷積層層數與空洞卷積聯合調整 . . . . . . . . . . . . . . . . . . . 53
4.4.4 空洞卷積網格效應 . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5 推論階段實驗 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.1 單尺度提取 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5.2 組合尺度提取 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5.3 訓練集實驗 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.6 HPatches 與 Oxford 之效能比較 . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.7 本章小結 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
第五 章 結論與未來規劃 . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1 結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 未來工作 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
參考文獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
[1] I. Lluvia, E. Lazkano, and A. Ansuategi, “Active mapping and robot exploration: A survey,” Sensors, vol. 21, no. 7, p. 2445, 2021.
[2] J. N. Kundu, M. Rahul, A. Ganeshan, and R. V. Babu, “Object pose estimation from monocular image using multi-view keypoint correspondence,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 298–313.
[3] S. Gauglitz, T. Höllerer, and M. Turk, “Evaluation of interest point detectors and feature descriptors for visual tracking,” Int. J. Comput. Vis., vol. 94, no. 3, pp. 335–360, 2011.
[4] C. Harris et al., “A combined corner and edge detector,” in Alvey Vis. Conf., 1988, pp. 1–6.
[5] D. Marimon, A. Bonnin, T. Adamek, and R. Gimeno, “DARTs: Efficient scale-space extraction of DAISY keypoints,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2010, pp. 2416–2423.
[6] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
[7] S. A. K. Tareen and Z. Saleem, “A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK,” in 2018 Int. Conf. Comput. Math. Eng. technologies, pp. 1–10.
[8] H. Bay, T. Tuytelaars, and L. V. Gool, “SURF: Speeded up robust features,” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 404–417.
[9] E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 430–443.
[10] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “BRIEF: Binary robust independent elementary features,” in Proc. Eur. Conf. Comput. Vis., 2010, pp. 778–792.
[11] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 2564–2571.
[12] A. Mukundan et al., “Understanding and improving kernel local descriptors,” Int. J. Comput. Vis., vol. 127, no. 11, pp. 1723–1737, 2019.
[13] J. Ma and Y. Deng, “SDGMNet: Statistic-based dynamic gradient modulation for local descriptor learning,” arXiv preprint arXiv:2106.04434, 2021.
[14] V. Balntas, K. Lenc, A. Vedaldi, and K. Mikolajczyk, “HPatches: A benchmark and evaluation of handcrafted and learned local descriptors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5173–5182.
[15] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 10, pp. 1615–1630, 2005.
[16] T. Lindeberg, “Linear scale-space and related multi-scale representations,” in Scale-space theory in computer vision, Berlin, Germany: Springer Science & Business Media, 1994, pp.31–54.
[17] E. Karami, S. Prasad, and M. Shehata, “Image matching using SIFT, SURF, BRIEF and ORB: performance comparison for distorted images,” arXiv preprint arXiv:1710.02726, 2017.
[18] C. Strecha, A. Lindner, K. Ali, and P. Fua, “Training for task specific keypoint detection,” in Joint pattern recognition symposium, Berlin, Germany: Springer Science & Business Media, 2009, pp.151–160.
[19] W. Hartmann, M. Havlena, and K. Schindler, “Predicting matchability,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 9–16.
[20] A. Barroso-Laguna, E. Riba, D. Ponsa, and K. Mikolajczyk, “KeyNet: Keypoint detection by handcrafted and learned CNN filters,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 5836–5844.
[21] Y. Verdie, K. Yi, P. Fua, and V. Lepetit, “Tilde: A temporally invariant learned detector,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 5279–5288.
[22] Y. Ono, E. Trulls, P. Fua, and K. M. Yi, “LF-Net: Learning local features from images,” Advances in neural information processing systems 31, 2018. [Online]. Available: https://proceedings.neurips.cc/paper/2018
[23] D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 224–236.
[24] K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, “LIFT: Learned invariant feature transform,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 467–483.
[25] J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “LoFTR: Detector-free local feature matching with transformers,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8922–8931.
[26] D. Shi et al., “Multi-actor hierarchical attention critic with RNN-based feature extraction,” Neurocomputing, vol. 471, pp. 79–93, 2022.
[27] H. Naeem and A. A. Bin-Salem, “A CNN-LSTM network with multi-level feature extraction-based approach for automated detection of coronavirus from CT scan and xray images,” Appl. Soft Comput., vol. 113, no. 1, p. 107918, 2021.
[28] J. Revaud et al., “R2D2: repeatable and reliable detector and descriptor,” arXiv preprint arXiv:1906.06195, 2019.
[29] Z. Luo et al., “Contextdesc: Local descriptor augmentation with cross-modality context,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2527–2536.
[30] H. Noh, A. Araujo, J. Sim, T. Weyand, and B. Han, “Large-scale image retrieval with attentive deep local features,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 3456–3465.
[31] Y. Tian, X. Yu, B. Fan, F. Wu, H. Heijnen, and V. Balntas, “SOSNET: Second order similarity regularization for local descriptor learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 11 016–11 025.
[32] A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
[33] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in Int. Conf. Mach. Learn., 2019, pp. 6105–6114.
[34] D. Mishkin, F. Radenovic, and J. Matas, “Repeatability is not enough: Learning affine regions via discriminability,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 284–300.
[35] A. Mishchuk, D. Mishkin, F. Radenovic, and J. Matas, “Working hard to know your neighbor’s margins: Local descriptor learning loss,” Advances in neural information processing systems 30, 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017
[36] M. Dusmanu et al., “D2-Net: A trainable CNN for joint description and detection of local features,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8092–8101.
[37] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[38] Y. Tian, B. Fan, and F. Wu, “L2-Net: Deep learning of discriminative patch descriptor in euclidean space,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 661–669.
[39] P. Truong, M. Danelljan, R. Timofte, and L. Van Gool, “PDC-Net+: Enhanced probabilistic dense correspondence network,” arXiv preprint arXiv:2109.13912, 2021.
[40] K. Li, L. Wang, L. Liu, Q. Ran, K. Xu, and Y. Guo, “Decoupling makes weakly supervised local feature better,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 15 838–15 848.
[41] Z. Huo et al., “D-MSCD: Mean-standard deviation curve descriptor based on deep learning,” IEEE Access, vol. 8, pp. 204 509–204 517, 2020.
[42] K. He, Y. Lu, and S. Sclaroff, “Local descriptors optimized for average precision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 596–605.
[43] K. Mikolajczyk et al., “A comparison of affine region detectors,” Int. J. Comput. Vis., vol. 65, no. 1, pp. 43–72, 2005.
[44] F. Radenović, A. Iscen, G. Tolias, Y. Avrithis, and O. Chum, “Revisiting oxford and paris: Large-scale image retrieval benchmarking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5706–5715.
[45] T. Sattler et al., “Benchmarking 6 dof outdoor visual localization in changing conditions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 8601–8610.
[46] P. Wang et al., “Understanding convolution for semantic segmentation,” in 2018 IEEE winter Conf. Appl. Comput. Vis., pp. 1451–1460.
[47] M. Brown, G. Hua, and S. Winder, “Discriminative learning of local image descriptors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 1, pp. 43–57, 2010.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *