台灣交通數據語義分割的深度學習模型__國立清華大學博碩士論文全文影像系統

帳號：guest(3.16.75.165) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	施威宇
作者(外文):	Shih, Wei-Yu
論文名稱(中文):	台灣交通數據語義分割的深度學習模型
論文名稱(外文):	Deep Learning Model for Semantic Segmentation of Taiwan Traffic Data
指導教授(中文):	劉晉良
指導教授(外文):	Liu, Jinn-Liang
口試委員(中文):	陳人豪陳仁純
口試委員(外文):	Chen, Ren-Hao Chen, Ren-Chun
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	計算與建模科學研究所
學號:	109026503
出版年(民國):	111
畢業學年度:	110
語文別:	中文
論文頁數:	57
中文關鍵詞:	深度學習、語意分割、台灣交通、資料集製作、comma10k
外文關鍵詞:	Deep learning、Semantic segmentation、Taiwan traffic、Dataset production、comma10k
相關次數:	推薦:0 點閱:44 評分: 下載:0 收藏:0

在本文中，我們將針對自駕車中的影像辨識進行研究，在此之前我們已對 comma.ai 開源的自駕車系統 openpilot 進行過研究，我們使用 comma.ai的硬體平台 comma two，作為我們的研究平台，而我們所關注的是，在自動駕駛領域中的影像辨識功能用在偵測車道線的識別及車輛前方的物件的研究。
comma.ai 在 GitHub 中有一組開源的語義分割資料集 comma10k (C10K)，
我們透過對於該資料集的影像辨識研究，整理出一套能夠根據台灣道路的資料，製作出可適用於 comma two 的 Semantic Segmentation 資料的方法，而我們也實際製作出了具有 20,000 張 label 過的台灣道路圖像資料集(T20K)，且透過這個方式，我們在未來仍然可以根據所得道路資料，繼續增加我們資料集的數量跟複雜度。此外，我們根據對於 comma two 原有模型的研究，成功製作出了能使用我們所製作的資料集的，以 EfficientNets作為 encoder 採用 U-Net 結構的 OPE1U 以及 O PE2U 語義分割影像辨識神經網路模型，並且在設計上以能放入硬體平台 comma two 中作為標準。值得注意的是，由於 openpilot 系統對於支援使用的 layers 及輸入輸出尺寸的
限制，我們還提出了 RGBtoYUV 的資料前處理方法，可以說我們製作的模型是專屬於 comma two 的，與其他現有模型皆不相同。最後，我們可以透過由不同的模型 OPE1U 或 OPE2U 對資料集 C10K 及 T20K 進行不同的組合搭配進行訓練並根據結果進行差異比較，可以從早期訓練的結果看出我們提出的模型的改善之處，讓 OPEU 更加適用於台灣交通上，也可以發現OPE2U 加上 T20K 的組合在準確率與損失方面都比其他的結果表現的更好。

In this thesis, we will investage image segmentation in self-driving cars. We have been conducting our research on comma.ai’s open-source self-driving car system openpilot. We use comma.ai’s hardware, comma two, as our research platform.The present study aims to extend our current neural network to include image segmentation that is deployable to comma two in the future.
Comma.ai has provided a data set called comma10k (C10K) on GitHub, which contains about 10,000 labeled images of USA traffic for semantic segmentation. We have created a larger data set of 20,000 labeled images of Taiwan traffic (T20K). We thus increase the number and complexity of the data for future studies on the semantic segmentation research of self driving cars. Moreover, based on the input and output format of the original deep neural network (DNN) of openpilot
(OP) for path planning and car following, we propose two DNNs that combine EfficientNets and UNet (OPE1U and OPE2U) for the additional task of semantic segmentation. OPEU is designed to be deployable to comma two in future work.Finally, we train OPE1U and OPE2U on various combinations of C10K and T20K,and find that OPE2U is better in terms of loss and accuracy on T20K.

誌謝
摘要 i
Abstract ii
1 緒論 1
1.1 研究動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 研究貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 論文組織 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 文獻回顧 5
2.1 語義分割模型 . . . . . . . . .. . . . . . . . . . . . . . . . . . 5
2.1.1 Fully Convolutional Networks (FCN) . . . . . . . . . . . . . . 5
2.1.2 U-Net . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 6
2.1.3 DeepLab . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
2.1.4 HRNet-OCR . . . . . . . . . . . . . . . . . . . . . . . . . . .6
3 深度學習神經網路 (Deep Learning Neural Networks) 8
3.1 模型總覽 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 製作 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Data 前處理 . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 MBconv Block 與 EfficientNet . . . . . . . . . . . . . . . . . .14
3.5 EfficientNetV2 . . . . . . . . . . . . . . . . . . . . . . . . .18
3.6 Task-Aware Neural Architecture Search (NAS) . . . . . . . . . . 21
3.7 Unet Decoder . . . . . . . . . . . . . . . . . . . . . . . . . .21
3.8 損失函數 (Loss Function) . . . . . . . . . . . . . . . . . . . .23
4 實驗設定 24
4.1 訓練環境 . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
4.2 資料製作步驟 . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 訓練步驟 . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
5 實驗結果 26
6 結論 31
參考文獻 33
附錄 37

[1] D.-H. Lee, K.-L. Chen, K.-H. Liou, C.-L. Liu, and J.-L. Liu, “Decoders matter for semantic
segmentation: Data-dependent decoding enables flexible feature aggregation,” Applied
Intelligence, vol. 51, no. 1, pp. 237–247, 2021.
[2] D.-H. Lee and J.-L. Liu, “End-to-end deep learning of lane detection and path prediction
for real-time autonomous driving,” arXiv preprint arXiv:2102.04738, 2021.
[3] G. Hogan, “commaai/openpilot,” https://github.com/commaai/openpilot.
[4] S. Ingle and M. Phute, “Tesla autopilot : Semi autonomous driving, an uptick for future autonomy,” in Proceedings of the International Research Journal of Engineering and
Technology (IRJET), vol. 03, no. 09, pp. 369–372, 2016.
[5] C. Reports, “Active driving assistance systems: Test results and design recommendations,” in Proceedings of the 2020 Consumer Reports ADAS System Test Rankings, 2020.
[6] D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “Yolact++: Better real-time instance segmentation,” arXiv preprint arXiv: 1912.06218v2, 2020.
[7] Q. S. software, “Snapdragon neural processing engine software development kit (snpe
sdk),” https://developer.qualcomm.com/sites/default/files/docs/snpe/overview.html.
[8] X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better
results,” arXiv preprint arXiv: 1811.11168, 2018.
[9] G. Hogan, “commaai/comma10k,” https://github.com/commaai/comma10k.
[10] Y. Yousfi, “comma10k-baseline,” https://github.com/YassineYousfi/comma10k-baseline.
[11] A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár, “Panoptic segmentation,” arXiv
preprint arXiv: 1801.00868v3, 2019.
[12] Y. Xiong, R. Liao, H. Zhao, R. Hu, M. Bai, E. Yumer, and R. Urtasun, “Upsnet: A unified
panoptic segmentation network,” arXiv preprint arXiv: 1901.03784, 2019.
[13] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” arXiv preprint arXiv: 1411.4038v2, 2015.
[14] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical
image segmentation,” arXiv preprint arXiv: 1505.04597, 2015.
[15] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” arXiv preprint arXiv: 1505.04366, 2015.
[16] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoderdecoder architecture for image segmentation,” arXiv preprint arXiv: 1511.00561v3, 2016.
[17] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image
segmentation with deep convolutional nets and fully connected crfs,” arXiv preprint arXiv:
1412.7062v4, 2016.
[18] ——, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” arXiv preprint arXiv: 1606.00915v2, 2017.
[19] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for
semantic image segmentation,” arXiv preprint arXiv: 1706.05587v3, 2017.
[20] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with
atrous separable convolution for semantic image segmentation,” arXiv preprint arXiv:
1802.02611v3, 2018.
[21] P. Krähenbühl and V. Koltun, “Efficient inference in fully connected crfs with gaussian
edge potentials,” arXiv preprint arXiv: 1210.5644, 2012.
[22] A. Kirillov, R. Girshick, K. He, and P. Dollár, “Panoptic feature pyramid networks,” arXiv
preprint arXiv: 1901.02446v2, 2019.
[23] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” arXiv
preprint arXiv: 1610.02357v3, 2017.
[24] A. Tao, K. Sapra, and B. Catanzaro, “Hierarchical multi-scale attention for semantic segmentation,” arXiv preprint arXiv: 2005.10821, 2020.
[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and
I. Polosukhin, “Attention is all you need,” arXiv preprint arXiv: 1706.03762v5, 2017.
[26] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” arXiv preprint arXiv: 1409.3215v3, 2014.
[27] K. Cho, B. V. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and
Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv: 1406.1078v3, 2014.
[28] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to
align and translate,” arXiv preprint arXiv: 1409.3215v3, 2016.
[29] M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” arXiv preprint arXiv: 1508.04025v5, 2015.
[30] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang,
W. Liu, and B. Xiao, “Deep high-resolution representation learning for visual recognition,”
arXiv preprint arXiv: 1908.07919v2, 2020.
[31] Y. Yuan, X. Chen, X. Chen, and J. Wang, “Segmentation transformer: Object-contextual
representations for semantic segmentation,” arXiv preprint arXiv: 1909.11065v6, 2021.
[32] M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural net-works,” in Proceedings of the 36th International Conference on Machine Learning, ser.Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov,
Eds.,vol. 97. PMLR, 2019.
[33] ——, “Efficientnetv2: Smaller models and faster training,” arXiv preprint arXiv:
2104.00298v3, 2021.
[34] B. Baheti, S. Innani, S. Gajre, and S. Talbar, “Eff-unet: A novel architecture for semantic
segmentation in unstructured environment,” in Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition Workshops(CVPRW), 2020.
[35] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016
IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 770–778,
2016.
[36] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,”
the International Journal of Robotics Research(IJRR), vol. 32, no. 11, pp. 1231–1237, doi:
10.1177/0278364913491297, 2013.
[37] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in Proceedings of the IEEE International Conference on Computer
Vision and Pattern Recognition (CVPR), 2012.
[38] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted
residuals and linear bottlenecks,” in Proceedings of the IEEE International Conference on
Computer Vision and Pattern Recognition (CVPR), 2018.
[39] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
arXiv preprint arXiv: 1512.03385, 2015.
[40] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” arXiv
preprint arXiv: 1709.01507v3, 2018.
[41] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is
worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:
2010.11929, 2020.
[42] C. P. Le, M. Soltani, R. Ravier, and V. Tarokh, “Task-aware neural architecture search,”
arXiv preprint arXiv: 2010.13962v3, 2021.
[43] P. Ramachandran, B. Zoph, and Q. V. Le, “Swish: a self-gated activation function,” arXiv
preprint arXiv: 1710.05941, 2017.
[44] D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:
1606.08415, 2016.
[45] S. Elfwing, E. Uchibe, and K. Doya, “Sigmoid-weighted linear units for neural network
function approximation in reinforcement learning,” arXiv preprint arXiv: 1702.03118,
2017.
[46] J. Wang, K. Chen, R. Xu, Z. Liu, C. C. Loy, and D. Lin, “Carafe: Content-aware reassembly of features,” arXiv preprint arXiv: 1905.02188v3, 2019.
[47] Z. Tian, T. He, C. Shen, and Y. Yan, “Decoders matter for semantic segmentation:
Data-dependent decoding enables flexible feature aggregation,” arXiv preprint arXiv:
1903.02120v3, 2019.

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文