帳號:guest(3.147.69.22)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳書屏
作者(外文):Chen, Shu-Ping
論文名稱(中文):基於時間特徵融合之連續影像物件偵測
論文名稱(外文):Video Object Detection with Temporal Feature Fusion
指導教授(中文):賴尚宏
指導教授(外文):LAI, SHANG-HONG
口試委員(中文):邱瀞德
許秋婷
口試委員(外文):CHIU, CHING-TE
HSU, CHIU-TING
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系所
學號:105062544
出版年(民國):107
畢業學年度:107
語文別:英文
論文頁數:36
中文關鍵詞:物件偵測深度學習
外文關鍵詞:object detectiondeep learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:319
  • 評分評分:*****
  • 下載下載:29
  • 收藏收藏:0
物體檢測是計算機視覺中的經典問題,並且由於深度學習技術而取得了顯著成就。然而,將現有技術的靜態圖像物件偵測擴展到連續影像之物件偵測是一項困難的挑戰。正確的靜態圖物件偵測網路並未使用來自連續影像的豐富時間信息,並且可能遭受從靜態圖像中從未見過的困難而導致偵測失敗。

在本文中,我們提出了一種深度學習網路架構以利用時間信息並聯合訓練整個模型來執行連續影像之物件偵測。我們的模型使用光流來對當前幀與前幀之間的特徵融合過程進行引導。我們還利用密集遞歸聚合來整合過去的特徵已妥善利用利用歷史時間信息。我們在ImageNet數據集和ITRI數據集上的實驗表明,所提出的架構可以在沒有大量時間成本的情況下實現有競爭力的檢測結果。
Object detection is a classical problem in computer vision. It has achieved significant improvement in recent years thanks to deep learning techniques. However, it is challenging to extend the state-of-the-art static image object detection techniques to video object detection since traditional object detectors usually work on a single frame and do not utilize rich temporal information from video.

In this thesis, we propose a ConvNet architecture that can utilize temporal information and jointly train the whole model to perform video object detection. Our model uses optical flow to guide the feature fusion process between current frame and the previous frame. We also utilize dense recursive aggregation to integrate features computed from the past frames and make use of temporal information. Our experiments on ImageNet dataset and ITRI dataset show that the proposed architecture can achieve competitive detection result without significant time cost.
ჯ⥱ i
Abstract ii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Related Work 6
2.1 Object detection in static images . . . . . . . . . . . . . . . . . . . . . 6
2.2 Object detection in videos . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Method 10
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Feature Pyramid Network . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Optical Flow Network . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Weight Map from Optical Flow . . . . . . . . . . . . . . . . . . . . . . 14
3.5 Recursive Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Experiments 19
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.1 ImageNet Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.2 ITRI Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Training and Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.1 Feature Pyramid Network . . . . . . . . . . . . . . . . . . . . 21
4.2.2 FlowNetS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.3 Joint Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.1 ImageNet dataset . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.2 ITRI dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.4 Comparison with the state-of-the-art . . . . . . . . . . . . . . . 26
4.3.5 Demo Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Conclusion 32
References 33
[1] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object
detection with region proposal networks,” in IEEE PAMI, 2016.
[2] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg,
“Ssd: Single shot multibox detector,” in ECCV, 2016.
[3] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-based fully
convolutional networks,” in NIPS, 2016.
[4] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and
A. Zisserman, “The pascal visual object classes challenge: A retrospective,” International
Journal of Computer Vision, vol. 111, pp. 98–136, Jan. 2015.
[5] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona,
D. Ramanan, C. L. Zitnick, and P. Dollár, “Microsoft coco: Common objects in
contex,” in ECCV, 2014.
[6] T. Kang, W. Ouyang, H. Li, and X. Wang, “Object detection from video tubelets
with convolutional neural networks,” in CVPR, 2016.
[7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in CVPR, 2016.
33
[8] P. Fischer, A. Dosovitskiy, E. Ilg, P. Häusser, C. Hazırbaş, V. Golkov, P. van der
Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional
networks,” in ICCV, 2015.
[9] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0:
Evolution of optical flow estimation with deep networks,” in CVPR, 2017.
[10] X. Zhu, Y. Wang, J. Dai, L. Yuan, and Y. Wei, “Flow-guided feature aggregation
for video object detection,” in ICCV, 2017.
[11] X. Zhu, J. Dai, L. Yuan, and Y. Wei, “Towards high performance video object
detection,” in CVPR, 2018.
[12] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature
pyramid networks for object detection,” in CVPR, 2017.
[13] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,
A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet
Large Scale Visual Recognition Challenge,” International Journal of Computer
Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
[14] J. Redmon, S. Divvala, and R. Girshick, “You only look once: Unified, real-time
object detection,” in CVPR, 2016.
34
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” in ICLR, 2015.
[16] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object
detection,” in ICCV, 2017.
[17] R. Girshick, “Fast r-cnn,” in ICCV, 2015.
[18] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, “Selective search for
object recoginition,” in IJCV, 2013.
[19] C. Feichtenhofer, A. Pinz, and A. Zisserman, “Detect to track and track to detect,”
in ICCV, 2017.
[20] X. Zhu, Y. Xiong, J. Dai, L. Yuan, and Y. Wei, “Deep feature flow for video recognition,”
in CVPR, 2017.
[21] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in ICCV, 2017.
[22] A. Dosovitskiy, P. Fischer, E. Ilg, P. Häusser, C. Hazırbaş, V. Golkov, P. v.d.
Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional
networks,” in IEEE International Conference on Computer Vision (ICCV),
2015.
[23] D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black, “A naturalistic open source
movie for optical flow evaluation,” in European Conf. on Computer Vision (ECCV)
35
(A. Fitzgibbon et al. (Eds.), ed.), Part IV, LNCS 7577, pp. 611–625, SpringerVerlag,
Oct. 2012.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *