都市夜間交通影片之實例分割__國立清華大學博碩士論文全文影像系統

帳號：guest(3.145.171.131) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	李碧寒
作者(外文):	Li, Bi-Han
論文名稱(中文):	都市夜間交通影片之實例分割
論文名稱(外文):	Instance Segmentation of Traffic Videos from a City Night Scene
指導教授(中文):	劉晉良
指導教授(外文):	Liu, Jinn-Liang
口試委員(中文):	陳人豪陳仁純
口試委員(外文):	Chen, Jen-Hao Chen, Ren-chun
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	計算與建模科學研究所
學號:	108026503
出版年(民國):	110
畢業學年度:	109
語文別:	中文
論文頁數:	34
中文關鍵詞:	實例分割、Yolact++、深度學習、K-means、資料增強
外文關鍵詞:	Instance segmentation、Yolact++、Deep learning、K-means、Data augmentation
相關次數:	推薦:0 點閱:155 評分: 下載:0 收藏:0

近年來，隨著自動駕駛汽車的發展迅速且逐漸成熟，在自駕車的影像辨識領域上主要被使用在車道檢測與路徑軌跡預測以及前方偵測。在本論文中，我們將針對自駕車的前方偵測這個主題進行研究，以Yolact++神經網路為基礎，將Yolact++改良成我們的模型YolactM，透過YolactM進行實例分割來預測本車前方的物件，最終輸出前方物件的邊界框、mask、類別以及置信度的辨識結果，其中，我們使用實際台灣夜間道路的影像，將影像切割成一幀幀的圖像，且每隔五幀的方式進行人工標註，以及我們還會利用資料增強的方式來增加我們的數據集(TdataN)，還有加上一些COCO數據集(TdataNCO)，作為YolactM的數據集，且將辨識的物件類別設置共13個類別，包括人、腳踏車、汽車、...以及紅綠燈等類別，皆是在台灣的道路上常見的物件類別。在YolactM模型中，我們將數據資料透過K-means機器學習演算法來調整anchor的設置，在實驗結果上也明顯的提升了小目標的精確度與提升了類別的精確度，並且在實驗結果的部分來比較Yolact++與YolactM在道路影片辨識的差異以及YolactM改善之處，透過我們提出的YolactM模型在測試結果上的提升，讓YolactM更加適應台灣的交通上。

In recent years, with the rapid development of self-driving cars, image recognition is mainly used in the recognition of lane lines of self-driving cars, path trajectory, and the detection of objects in front of a self-driving car. In this thesis, we use a modified Yolact++ (YolactM) to perform instance segmentation of traffic videos recorded in a Taiwan city at night. The output of YolactM consists of the category, score, bounding boxes, and masks of object instances. We cut the videos of the night road into frames of images which are annotated every five frames. We use the method of data augmentation to enrich our dataset (TdataN). We train YolactM on TdataN and some COCO data (TdataNCO) with 13 classes including person, bicycle, car, ..., and traffic light in the usual scene of a crowded Taiwan traffic. After the final training is completed, it can detect objects in front of the vehicle. In YolactM model, we use the K-means algorithm to find the three clusters of the TdataNCO data. The centroids of these clusters are used to define the aspect ratios of anchors for different sizes of objects. The algorithm improves average recall of small object and average precision of some categories in YolactM's experimental results. We compare the difference between Yolact++ and YolactM in these results and show that YolactM is better than Yolact++ for the instance segmentation of traffic videos recorded in Taiwan city at night.

摘要----------------------------------------------I
Abstract ----------------------------------------II
致謝---------------------------------------------III
第一章--------------------------------------------1
第二章相關文獻------------------------------------3
第三章 YolactM------------------------------------6
3.1 YolactM 架構----------------------------------6
3.2 Backbone--------------------------------------7
3.2.1 深度殘差網路---------------------------------7
3.2.2 可變形卷積-----------------------------------8
3.3 Neck------------------------------------------9
3.3.1 特徵金字塔網路-------------------------------9
3.3.2 應用---------------------------------------10
3.4 Head-----------------------------------------10
3.4.1 Anchor-------------------------------------10
3.4.2 Anchor設置---------------------------------12
3.4.3 應用---------------------------------------14
3.5 後處理---------------------------------------16
3.6 損失函數-------------------------------------18
3.6.1 邊界框定位損失 (Box Localization Loss)------18
3.6.2 類別置信度損失 (Class Confidence Loss)------19
3.6.3 Mask損失 (Mask Loss)-----------------------19
3.6.4 MaskIOU損失 (MaskIOU Loss)-----------------20
3.6.5 語義分割損失(Semantic Segmentation Loss)----20
3.6.6 整體損失(Total Loss)------------------------21
第四章實驗設定-----------------------------------22
4.1 資料前處理步驟--------------------------------22
4.2 訓練步驟--------------------------------------23
4.3 資料增強步驟----------------------------------24
第五章實驗結果-----------------------------------25
5.1 邊界框結果------------------------------------25
5.2 mask結果-------------------------------------26
5.3 Yolact++與YolactM交通影片預測結果--------------27
第六章結論---------------------------------------28
參考文獻------------------------------------------29
附錄----------------------------------------------32

[1] D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee. YOLACT++: Better Real-time Instance
Segmentation. In TPAMI, doi: 10.1109.2020.3014297, 2020.
[2] D.-H. Lee, K.-L. Chen, K.-H. Liou, C.-L. Liu, J.-L. Liu. Deep learning and control
algorithms of direct perception for autonomous driving. Applied Intelligence 51,
237-247, 2021.
[3] D.-H. Lee and J.-L. Liu. End-to-end deep learning of lane detection and path
prediction for real-time autonomous driving. arXiv:2102.04738, 2021.
[4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for
accurate object detection and semantic segmentation Tech report (v5). arXiv:
1311.2524v5, 2014.
[5] G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T. Y. Lin, E. D. Cubuk, ..., and B.
Zoph. Simple Copy-Paste is a Strong Data Augmentation Method for Instance
Segmentation. arXiv:2012.07177, 2020.
[6] X. Ke, J. Zou, and Y. Niu. End-to-end automatic image annotation based on
deep CNN and multi-label data augmentation. IEEE Transactions on Multimedia,
21(8), pages 2093-2106, 2019.
[7] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, ... , and C.
L. Zitnick. Microsoft coco: Common objects in context. In ECCV, pages 740-755,
2014.
[8] R. P. Martinez, I. Schiopu, B. Cornelis, and A. Munteanu. Real-time instance
segmentation of traffic videos for embedded devices. Sensors, 21(1), 275, 2021.
[9] B.C. Russell, A. Torralba, K. P. Murphy, and W.T. Freeman. LabelMe: a database
and web-based tool for image annotation. In IJCV, pages 157-173, 2008.
[10] A. Shrivastava, R. Sukthankar, J. Malik, and A. Gupta. Beyond skip connections:
Top-down modulation for object detection. arXiv:1612.06851, 2016.
[11] J. Uhrig, E. Rehder, B. Fr¨ohlich, U. Franke, and T. Brox. Box2pix: Single-shot
instance segmentation by assigning pixels to object boxes. In IEEE IV, 2018.
[12] L.-C. Chen, A. Hermans, G. Papandreou, F. Schroff, P. Wang, and H. Adam.
MaskLab: Instance segmentation by refining object detection with semantic and
direction features. In CVPR, pages 4013–4022, 2018.
[13] Z. Tian, C. Shen, H. Chen, and T. He.FCOS: Fully Convolutional One-Stage Object Detection. arXiv:1904.01355, 2019.
[14] H. Law and J. Deng. Cornernet: Detecting objects as paired keypoints. In ECCV,
2018.
[15] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian. CenterNet: Keypoint
triplets for object detection. In ICCV, 2019.
[16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified,
real-time object detection. In CVPR, pages 779-788, 2016.
[17] K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask R-CNN. In ICCV, pages
2980-2988, 2017.
[18] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object
detection with region proposal networks. In NIPS, 2015.
[19] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic
segmentation. In CVPR, 2015.
[20] Y. Li, H. Qi, J. Dai , X. Ji, and Y. Wei. Fully convolutional instance-aware semantic
segmentation. In CVPR, pages 4438-4446, 2017.
[21] H. Chen, K. Sun, Z. Tian, C. Shen, Y. Huang, and Y. Yan. BlendMask: Top-down
meets bottom-up for instance segmentation. In CVPR, 2020.
[22] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.
In CVPR, 2016.
[23] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. Deformable convolutional networks. In ICCV, 2017.
[24] A. Kirillov, R. Girshick, K. He, and P. Dollar. Panoptic feature pyramid networks.
In CVPR, 2019.
[25] O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional networks for
biomedical image segmentation. In MICCAI,2015.
[26] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose
estimation. In ECCV, 2016.
[27] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
Berg.SSD: Single shot multibox detector. In ECCV, 2016.
[28] Z. Huang, L. Huang, Y. Gong, C. Huang, and X. Wang. Mask Scoring R-CNN. In
CVPR, 2019.
[29] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal loss for dense object
detection. In CVPR, 2017.
[30] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia. Path aggregation network for instance
segmentation. In CVPR, 2018.
[31] C.-Y. Fu, M. Shvets, and A. C. Berg. Retinamask: Learning to predict masks
improves state-of-the-art single-shot detection for free. arXiv:1901.03353, 2019.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文