帳號:guest(18.119.213.213)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):劉律奇
作者(外文):Liu, Lu-Chi
論文名稱(中文):基於Transformer網路及可變形卷積與中心差分卷積的物件偵測方法
論文名稱(外文):Transformer-based method with Deformable Convolution and Central Difference Convolution for Object Detection
指導教授(中文):陳人豪
指導教授(外文):Chen, Jen-Hao
口試委員(中文):李金龍
陳仁純
口試委員(外文):Li, Chin-Lung
Chen, Ren-Chuen
學位類別:碩士
校院名稱:國立清華大學
系所名稱:計算與建模科學研究所
學號:111026509
出版年(民國):113
畢業學年度:112
語文別:英文
論文頁數:20
中文關鍵詞:物件偵測可變形卷積中心差分卷積可變形注意力機制瑕疵檢測
外文關鍵詞:TransformerDeformable attentionDeformable ConvolutionCentral Difference ConvolutionObject detectionDefect detection
相關次數:
  • 推薦推薦:0
  • 點閱點閱:8
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
在研究中引入兩種適配器模組來深入研究增強real-time detection transformer (RT-DETR) 的檢測能力:可變形卷積 deformable convolutional network (DeformConv) 適配器模組和中心差分卷積 central difference convolution (CDC) 適配器模組。這些模組銜接於 RT-DETR 的骨幹網路和 Transformer 網路,旨在提高模型準確定位和分類物體的能力。

DeformConv 透過在卷積核中引入可學習的偏移量,DeformConv 可以調整感受野以與實際物件邊界對齊,從而使物件特徵能更精確的被提取。相較之下, CDC 利用創造相近像素之間的中心梯度來強調局部模式,有效突顯物體的邊緣,並在保持一致性的同時,提高了物體邊緣的檢測準確率。

為評估本研究所提出的適配器模組的有效性,我們在兩個基準資料集NEU-DET 鋼板裂縫和 COCO 資料集上進行實驗。 在 NEU-DET 資料集上,與 RT-DETR 相比,搭載DeformConv 適配器模組在中等尺寸缺陷的精確率 (mAP) 上顯著提高了 1%,在大尺寸缺陷的召回率 (AR) 上提高了 0.1% DETR 模型。結果強調 DeformConv 對於這類具有複雜形狀的瑕疵特徵萃取上有些微的提高。在 COCO 資料集上,搭載 CDC 適配器模組對於中等大小的物體表現出 0.1% 的 mAP 增益和 0.5% 的AR提升。顯示 CDC 在提取細粒度細節以及將區分物件與背景有較好的效果。 簡而言之,DeformConv 和CDC 兩種適配器模組,它們在不同的應用場,可以增強 RT-DETR 模型的物件偵測能力。對於複雜形狀物體偵測 DeformConv 能夠較有效捕捉物件形狀變化,而 CDC 在物體可能被背景遮蔽或是複雜的情況下,能將目標物體與背景區分。
This study devotes to enhance the detection capabilities of real-time detection transformer (RT-DETR) by incorporating two adapter modules: the deformable convolutional network (DeformConv) and the central difference convolution (CDC) adapter module. These modules are integrated into backbone of RT-DETR and Transformer network, aiming to improve the ability of model to accurately locate and classify objects.

To evaluate the effectiveness of the proposed adapter modules, comprehensive experiments are conducted on two benchmark datasets: the NEU-DET steel plate crack dataset and the COCO dataset. On the NEU-DET dataset, compared to RT-DETR, the DeformConv adapter module achieved a significant 1% improvement in mean average precision (mAP) for medium-sized defects and a 0.1% improvement in recall (AR) for large-sized defects. These results highlight the capability of DeformConv for this type of defect with complex shapes. On the COCO dataset, the CDC adapter module exhibited a 0.1% mAP gain and a 0.5% AR improvement for medium-sized objects. These results demonstrate
the effectiveness of CDC in extracting fine-grained details and distinguishing objects from the background. In summary, both DeformConv and CDC adapter modules have the potential to enhance the object detection capabilities of the RT-DETR model in different application scenarios. DeformConv can effectively capture object shape variations for complex-shaped object detection, while CDC can distinguish target objects from the background in situations where objects may be obscured or the background is complicated.
Contents
Abstract (Chinese) I
Abstract II
Contents III
List of Figures V
List of Tables VI
1 Introduction 1
2 Related Works 3
2.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Deformable Convolution . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Central Difference Convolution . . . . . . . . . . . . . . . . 4
2.2 Transformer Detector . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Attention Mechanism . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Deformable Attention . . . . . . . . . . . . . . . . . . . . . . 6
3 Methodology 8
3.1 Backbone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Transformer Detector . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 Efficient Hybrid Encoder . . . . . . . . . . . . . . . . . . . . 10
3.2.2 Decoder with Deformable Attention . . . . . . . . . . . . . . 11
4 Experiments 12
4.1 Setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Conclusion 15
Bibliography 16
[1] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only
look once: Unified, real-time object detection. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
[2] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar.
Focal loss for dense object detection. In 2017 IEEE International Conference
on Computer Vision (ICCV), pages 2999–3007, 2017.
[3] Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, and Jieping Ye. Object detection in 20 years: A survey. Proceedings of the IEEE, 111(3):257–276,
2023.
[4] Ayoub Benali Amjoud and Mustapha Amrouch. Object detection using deep
learning, cnns and vision transformers: A review. IEEE Access, 11:35479–
35516, 2023.
[5] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers, 2020.
[6] Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, and Lei Zhang. Dndetr: Accelerate detr training by introducing query denoising. In Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), pages 13619–13627, June 2022.
[7] Wenyu Lv, Yian Zhao, Shangliang Xu, Jinman Wei, Guanzhong Wang, Cheng
Cui, Yuning Du, Qingqing Dang, and Yi Liu. Detrs beat yolos on real-time
object detection, 2023.
[8] Aref Miri Rekavandi, Shima Rashidi, Farid Boussaid, Stephen Hoefs, Emre
Akbas, and Mohammed bennamoun. Transformers in small object detection:
A benchmark and survey of state-of-the-art, 2023.
[9] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger.
Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
[10] Ying Li, Binbin Fan, Weiping Zhang, and Zhiqiang Jiang. Tirenet: A high
recall rate method for practical application of tire defect type classification.
Future Generation Computer Systems, 125:1–9, 2021.
[11] Rafael Padilla, Wesley L Passos, Thadeu LB Dias, Sergio L Netto, and Eduardo AB Da Silva. A comparative analysis of object detection metrics with
a companion open-source toolkit. Electronics, 10(3):279, 2021.
[12] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual
learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), June 2016.
[13] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and
Yichen Wei. Deformable convolutional networks. In 2017 IEEE International
Conference on Computer Vision (ICCV), pages 764–773, 2017.
[14] Zitong Yu, Chenxu Zhao, Zezheng Wang, Yunxiao Qin, Zhuo Su, Xiaobai
Li, Feng Zhou, and Guoying Zhao. Searching central difference convolutional
networks for face anti-spoofing. In 2020 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), pages 5294–5304, 2020.
[15] Cl´ement Playout, Ola Ahmad, Freddy Lecue, and Farida Cheriet. Adaptable deformable convolutions for semantic segmentation of fisheye images in
autonomous driving systems, 2021.
[16] Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. Deformable convnets v2:
More deformable, better results, 2018.
[17] Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and Stan Z. Li. Bridging
the gap between anchor-based and anchor-free detection via adaptive training
sample selection. In 2020 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), pages 9756–9765, 2020.
[18] Shifeng Zhang, Longyin Wen, Zhen Lei, and Stan Z. Li. Refinedet++: Singleshot refinement neural network for object detection. IEEE Transactions on
Circuits and Systems for Video Technology, 31(2):674–687, 2021.
[19] Haoyang Zhang, Ying Wang, Feras Dayoub, and Niko S¨underhauf. Varifocalnet: An iou-aware dense object detector, 2021.
[20] Zitong Yu, Yunxiao Qin, Hengshuang Zhao, Xiaobai Li, and Guoying Zhao.
Dual-cross central difference network for face anti-spoofing. arXiv preprint
arXiv:2105.01290, 2021.
[21] Zhiqin Liu, Nan Zhu, and Kun Wang. Recaptured image forensics based
on generalized central difference convolution network. In 2022 IEEE 2nd
International Conference on Software Engineering and Artificial Intelligence
(SEAI), pages 59–63. IEEE, 2022.
[22] Xinyu Wu, Dongliang Ma, Xin Qu, Xin Jiang, and Dan Zeng. Depth dynamic
center difference convolutions for monocular 3d object detection. Neurocomputing, 520:73–81, 2023.

[23] Zitong Yu, Chenxu Zhao, and Zhen Lei. Face presentation attack detection,
2022.
[24] Chengyang Hu, Junyi Cao, Ke-Yue Zhang, Taiping Yao, Shouhong Ding, and
Lizhuang Ma. Structure destruction and content combination for generalizable anti-spoofing. IEEE Transactions on Biometrics, Behavior, and Identity
Science, 4(4):508–521, 2022.
[25] Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer,
Alexander Ku, and Dustin Tran. Image transformer, 2018.
[26] Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan,
Lei Sun, and Jingdong Wang. Conditional detr for fast training convergence,
2023.
[27] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn,
Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer,
Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[28] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you
need. Advances in neural information processing systems, 30, 2017.
[29] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen
Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer
using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
[30] Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and
Radu Timofte. Swinir: Image restoration using swin transformer. In Proceed-
ings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Workshops, pages 1833–1844, October 2021.
[31] Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq:
Multi-scale image quality transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5148–5157, October
2021.
[32] Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai.
Deformable detr: Deformable transformers for end-to-end object detection,
2021.
[33] Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang,
Tong Lu, Ping Luo, and Ling Shao. Pyramid vision transformer: A versatile
backbone for dense prediction without convolutions, 2021.
[34] Kechen Song and Yunhui Yan. A noise robust method based on completed
local binary patterns for hot-rolled steel strip surface defects. Applied Surface
Science, 285:858–864, 2013.
[35] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona,
Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In David Fleet, Tomas Pajdla, Bernt Schiele, and
Tinne Tuytelaars, editors, Computer Vision – ECCV 2014, pages 740–755,
Cham, 2014. Springer International Publishing.
[36] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference
on Computer Vision and Pattern Recognition, pages 248–255, 2009.
[37] Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization,
2019.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *