作者(外文):Liu, Lu-Chi
論文名稱(外文):Transformer-based method with Deformable Convolution and Central Difference Convolution for Object Detection
指導教授(外文):Chen, Jen-Hao
口試委員(外文):Li, Chin-Lung
Chen, Ren-Chuen
外文關鍵詞:TransformerDeformable attentionDeformable ConvolutionCentral Difference ConvolutionObject detectionDefect detection
在研究中引入兩種適配器模組來深入研究增強real-time detection transformer (RT-DETR) 的檢測能力:可變形卷積 deformable convolutional network (DeformConv) 適配器模組和中心差分卷積 central difference convolution (CDC) 適配器模組。這些模組銜接於 RT-DETR 的骨幹網路和 Transformer 網路,旨在提高模型準確定位和分類物體的能力。

DeformConv 透過在卷積核中引入可學習的偏移量,DeformConv 可以調整感受野以與實際物件邊界對齊,從而使物件特徵能更精確的被提取。相較之下, CDC 利用創造相近像素之間的中心梯度來強調局部模式,有效突顯物體的邊緣,並在保持一致性的同時,提高了物體邊緣的檢測準確率。

為評估本研究所提出的適配器模組的有效性,我們在兩個基準資料集NEU-DET 鋼板裂縫和 COCO 資料集上進行實驗。 在 NEU-DET 資料集上,與 RT-DETR 相比,搭載DeformConv 適配器模組在中等尺寸缺陷的精確率 (mAP) 上顯著提高了 1%,在大尺寸缺陷的召回率 (AR) 上提高了 0.1% DETR 模型。結果強調 DeformConv 對於這類具有複雜形狀的瑕疵特徵萃取上有些微的提高。在 COCO 資料集上,搭載 CDC 適配器模組對於中等大小的物體表現出 0.1% 的 mAP 增益和 0.5% 的AR提升。顯示 CDC 在提取細粒度細節以及將區分物件與背景有較好的效果。 簡而言之,DeformConv 和CDC 兩種適配器模組,它們在不同的應用場,可以增強 RT-DETR 模型的物件偵測能力。對於複雜形狀物體偵測 DeformConv 能夠較有效捕捉物件形狀變化,而 CDC 在物體可能被背景遮蔽或是複雜的情況下,能將目標物體與背景區分。
This study devotes to enhance the detection capabilities of real-time detection transformer (RT-DETR) by incorporating two adapter modules: the deformable convolutional network (DeformConv) and the central difference convolution (CDC) adapter module. These modules are integrated into backbone of RT-DETR and Transformer network, aiming to improve the ability of model to accurately locate and classify objects.

To evaluate the effectiveness of the proposed adapter modules, comprehensive experiments are conducted on two benchmark datasets: the NEU-DET steel plate crack dataset and the COCO dataset. On the NEU-DET dataset, compared to RT-DETR, the DeformConv adapter module achieved a significant 1% improvement in mean average precision (mAP) for medium-sized defects and a 0.1% improvement in recall (AR) for large-sized defects. These results highlight the capability of DeformConv for this type of defect with complex shapes. On the COCO dataset, the CDC adapter module exhibited a 0.1% mAP gain and a 0.5% AR improvement for medium-sized objects. These results demonstrate
the effectiveness of CDC in extracting fine-grained details and distinguishing objects from the background. In summary, both DeformConv and CDC adapter modules have the potential to enhance the object detection capabilities of the RT-DETR model in different application scenarios. DeformConv can effectively capture object shape variations for complex-shaped object detection, while CDC can distinguish target objects from the background in situations where objects may be obscured or the background is complicated.
