車輛偵測:從偵測模型設計到跨領域適應__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.49) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	林哲聰
作者(外文):	Lin, Che-Tsung
論文名稱(中文):	車輛偵測:從偵測模型設計到跨領域適應
論文名稱(外文):	Vehicle Detection: From Detector Design to Cross-Domain Adaptation
指導教授(中文):	賴尚宏
指導教授(外文):	Lai, Shang-Hong
口試委員(中文):	邱瀞德陳煥宗李哲榮杭學鳴邱維辰劉庭祿王鈺強
口試委員(外文):	Chiu, Ching-Te Chen, Hwann-Tzong Lee, Che-Rung Hang, Hsueh-Ming Chiu, Wei-Chen Liu, Tyng-Luh Wang, Yu-Chiang
學位類別:	博士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	103062805
出版年(民國):	109
畢業學年度:	108
語文別:	英文
論文頁數:	84
中文關鍵詞:	深度學習、車輛偵測、跨領域適應、生成式對抗網路
外文關鍵詞:	deep learning、vehicle detection、cross-domain adaptation、generative adversarial network
相關次數:	推薦:0 點閱:94 評分: 下載:0 收藏:0

車輛偵測一直是先進駕駛輔助系統及自駕車中最為核心的功能之一，過去廣泛的研究已經證明，先進的深度學習模型在各類公開的物體偵測資料集皆可以取得相當優異的表現。然而，這些模型往往都是二階段的，因此需要極高的運算資源，且難以在嵌入式系統上達到即時運算。在本篇論文中，我們提出了一個單階段的模型，其能在NVIDIA DrivePX2此嵌入式平台上達到即時車輛偵測。此外，我們提出了一個多階段、以整張影像為基礎的難例挖掘訓練策略，具體方式是在物體偵測模型的訓練過程，先用全部的資料進行訓練直到辨識率的提升收歛，再進一步的運用難例及IOU略為不足的訓練資料來微調網路，因而可達到更高的辨識率。

我們期待車輛偵測模型可以在日夜不同情境，皆達到準確的車輛偵測結果，然而車輛在白天與晚上的外觀極為不同。資料擴充在基於深度學習之物體偵測模型訓練中相當的常見，而這樣的技巧可用來提升模型的強健性以及跨領域適應性。以往資料擴充的方法通常是由一般的影像處理演算法所組成，運用這樣的方法所擴充的影像，影像多樣性通常較為局限。近年來，生成式對抗網路已被證明能產生多樣化的影像，然而，過去的模型往往在影像轉換過程難以維持物體結構在轉後前後的一致性。因此，本文提出了AugGAN這樣的生成式對抗網路模型，其可在日轉夜這種影像風格極為不同的轉換中，達到較佳的物體保存性。然而，給定一張日間影像，AugGAN所能轉換的夜間影像風格是恆定的，因此，我們進一步的提出多模態(Multimodal)版的AugGAN，其可將一日間影像轉換成複數張夜間影像，且這些影像具有不同的環境光亮度、影像中車燈的明暗程度不同，但車輛的車型、顏色、以及位置與轉換前是一致的。

Vehicle detection is a fundamental function required for advanced driver assistance systems and autonomous vehicles. Extensive research has shown that good performance can be obtained on public data sets by various state-of-the-art approaches, especially the deep learning methods. However, those methods are mostly two-stage approaches which unavoidably require extensive computing resources and are hard to be deployed on an embedded computing platform with real-time computing performance. In this thesis, we introduce a single-stage vehicle detector which can function in real-time on NVIDIA DrivePX2 platform and propose a multi-stage image-based online hard example mining framework which performs fine-tuning on hard examples and the ones with slightly-insufficient IOU that are considered true positives.

One expects that vehicles around host driver could be detected as accurately as possible all day, including day and night. However, vehicle’s appearance at daytime is quite different from its counterpart at nighttime. Data augmentation plays a crucial role in training a CNN-based detector for enhancing its cross-domain robustness. Most previous approaches were based on using a combination of general image-processing operations and could only produce limited plausible image variations. Recently, GAN (Generative Adversarial Network) based methods have shown compelling visual results. However, they are prone to fail at preserving image-objects and maintaining translation consistency when faced with large and complex domain shifts, such as day-to-night. We proposed AugGAN, a GAN-based data augmenter which could transform on-road driving images to a desired domain while image-objects would be well preserved. Although this model could transform on-road images from daytime to nighttime with better object preservation, the appearances of the transformed vehicles are seen all under the same ambient light levels. Therefore, we further propose Multimodal AugGAN, a multimodal structure-consistent GAN which is capable of transforming daytime vehicles to their nighttime counterparts with different ambient light levels and rear lamp conditions (on/off) but with the same vehicle type, color and locations.

1 Introduction 2
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Related Work 5
2.1 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Domain Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Proposed Object Detection Model 10
3.1 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Multi-Scale Feature Maps and Bounding Box Priors . . . . . . . . . 13
3.3 Non-Maximal Suppression . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Multi-Stage Image-based Online Hard Example Mining . . . . . . . 14
3.5 Loss Functions of Training and Fine-tuning . . . . . . . . . . . . . 16
4 Proposed GAN Models 20
4.1 AugGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1.1 Structure-Aware Encoding and Segmentation Subtask . . . 22
4.1.2 Adversarial Learning . . . . . . . . . . . . . . . . . . . . . 23
4.1.3 Weight-Sharing for Multi-Task Network . . . . . . . . . . . 24
4.1.4 Cycle Consistency . . . . . . . . . . . . . . . . . . . . . . 25
4.1.5 Network Learning . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Multimodal AugGAN . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.1 Adversarial Learning . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 Image-Translation-Structure Consistency . . . . . . . . . . 31
4.2.3 Cycle-Structure Consistency . . . . . . . . . . . . . . . . . 32
4.2.4 Network Learning . . . . . . . . . . . . . . . . . . . . . . 33
5 Experimental Results 35
5.1 Single-Stage Vehicle Detector . . . . . . . . . . . . . . . . . . . . 35
5.1.1 PASCAL VOC Dataset . . . . . . . . . . . . . . . . . . . . 35
5.1.2 KITTI Dataset . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1.3 CarSim-Generated Data . . . . . . . . . . . . . . . . . . . 40
5.1.4 iROADS Dataset . . . . . . . . . . . . . . . . . . . . . . . 43
5.1.5 Recall Analysis of the Proposed Data Augmentation Strategy 45
5.1.6 The Benefit of Multi-Scale Feature and Bounding Box Prior 46
5.1.7 IOU Fine-Tuning Analysis . . . . . . . . . . . . . . . . . . 48
5.2 AugGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.1 Synthetic Datasets . . . . . . . . . . . . . . . . . . . . . . 50
5.2.2 KITTI and ITRI-Night Datasets . . . . . . . . . . . . . . . 52
5.2.3 ITRI Daytime and Nighttime Datasets . . . . . . . . . . . . 54
5.2.4 On-Road Nighttime Vehicle Detection Result Analysis . . . 56
5.2.5 Training Detectors with Real Night Images & AugGANGenerated
Night Images . . . . . . . . . . . . . . . . . . . 58
5.2.6 Transformations other than Daytime & Nighttime . . . . . . 59
5.2.7 Loss Function Analysis . . . . . . . . . . . . . . . . . . . . 62
5.2.8 Subjective Evaluation of AugGAN . . . . . . . . . . . . . 64
5.3 Multimodal AugGAN . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3.1 Synthetic Datasets . . . . . . . . . . . . . . . . . . . . . . 68
5.3.2 BDD100k Dataset . . . . . . . . . . . . . . . . . . . . . . 70
5.3.3 Training Detectors with Real Night Images & Multimodal
AugGAN-Generated Night Images . . . . . . . . . . . . . . 72
5.3.4 Image Quality and Diversity Evaluation of Multimodal Aug-
GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3.5 Detector Training Strategy Discussion Given Both Unimodal
& Multimodal Data . . . . . . . . . . . . . . . . . . . . . . 75
5.3.6 Transformations other than Daytime & Nighttime . . . . . . 76
5.3.7 Semantic Segmentation Across Domains . . . . . . . . . . 77
6 Conclusions and Future Work 79
References 81

[1] Z. Sun, G. Bebis, and R. Miller, “On-road vehicle detection: A review,” IEEE
Transactions on Pattern Analysis & Machine Intelligence, no. 5, pp. 694–711,
2006.
[2] X. Mao, D. Inoue, S. Kato, and M. Kagami, “Amplitude-modulated laser radar
for range and speed measurement in car applications,” IEEE Transactions on
Intelligent Transportation Systems, vol. 13, no. 1, pp. 408–413, 2011.
[3] J. Levinson, J. Askeland, J. Becker, J. Dolson, D. Held, S. Kammel, J. Z.
Kolter, D. Langer, O. Pink, V. Pratt, et al., “Towards fully autonomous driving:
Systems and algorithms,” in IEEE IV symposium, 2011.
[4] A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors
with online hard example mining,” in CVPR, 2016.
[5] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, and Y. Bengio, “Generative adversarial nets,” in NIPS, 2014.
[6] V. Vapnik and C. Cortes, “Support vector networks,” Machine learning,
vol. 20, no. 3, pp. 273–297, 1995.
[7] Y. Freund, R. Schapire, and N. Abe, “A short introduction to boosting,”
Journal-Japanese Society For Artificial Intelligence, vol. 14, no. 771-780,
p. 1612, 1999.
[8] W. Liu, X. Wen, B. Duan, H. Yuan, and N. Wang, “Rear vehicle detection and
tracking for lane change assist,” in IEEE IV symposium.
[9] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,”
in CVPR, 2005.
[10] T. Machida and T. Naito, “Gpu & cpu cooperative accelerated pedestrian and
vehicle detection,” in ICCV workshops.
[11] Q. Yuan, A. Thangali, V. Ablavsky, and S. Sclaroff, “Learning a family of
detectors via multiplicative kernels,” IEEE transactions on pattern analysis &
machine intelligence, vol. 33, no. 3, pp. 514–530, 2010.
[12] N. Blanc, B. Steux, and T. Hinz, “Larasidecam: A fast and robust vision-based
blindspot detection system,” in IEEE IV symposium.
[13] Y. Zhang, S. J. Kiselewich, and W. A. Bauson, “Legendre and gabor moments
for vehicle recognition in forward collision warning,” in IEEE ITSC.
[14] P. Viola, M. Jones, et al., “Rapid object detection using a boosted cascade of
simple features,” CVPR, 2001.
[15] T. Liu, N. Zheng, L. Zhao, and H. Cheng, “Learning based symmetric features
selection for vehicle detection,” in IEEE IV symposium.
[16] I. Kallenbach, R. Schweiger, G. Palm, and O. Lohlein, “Multi-class object
detection in vision systems using a hierarchy of cascaded classifiers,” in IEEE
IV Symposium.
[17] D. Acunzo, Y. Zhu, B. Xie, and G. Baratoff, “Context-adaptive approach for
vehicle detection under varying lighting conditions,” in IEEE ITSC.
[18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with
deep convolutional neural networks,” in NIPS, 2012.
[19] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in
CVPR, 2015.
[20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in CVPR, 2016.
[22] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,
A. Karpathy, A. Khosla, M. Bernstein, et al., “Imagenet large scale visual
recognition challenge,” IJCV, vol. 115, no. 3, pp. 211–252, 2015.
[23] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for
accurate object detection and semantic segmentation,” in CVPR, 2014.
[24] R. Girshick, “Fast r-cnn,” in ICCV, 2015.
[25] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object
detection with region proposal networks,” in NIPS, 2015.
[26] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-based fully
convolutional networks,” in NIPS, 2016.
[27] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, “A unified multi-scale deep
convolutional neural network for fast object detection,” in ECCV, 2016.
[28] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once:
Unified, real-time object detection,” in CVPR, 2016.
[29] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in CVPR,
2017.
[30] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg,
“Ssd: Single shot multibox detector,” in ECCV, 2016.
[31] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The
pascal visual object classes (voc) challenge,” IJCV, 2010.
[32] S. Migacz, “8-bit inference with tensorrt,” in GPU Technology Conference,
2017.
[33] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-nms–improving object
detection with one line of code,” in ICCV, 2017.
[34] M. Braun, S. Krebs, F. Flohr, and D. M. Gavrila, “The eurocity persons dataset:
A novel benchmark for object detection,” arXiv preprint arXiv:1805.07193,
2018.
[35] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with
conditional adversarial networks,” in CVPR, 2017.
[36] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation
using cycle-consistent adversarial networkss,” in ICCV, 2017.
[37] T. Kim, M. Cha, H. Kim, J. Lee, and J. Kim, “Learning to discover
cross-domain relations with generative adversarial networks,” arXiv preprint
arXiv:1703.05192, 2017.
[38] Z. Yi, H. Zhang, P. Tan, and M. Gong, “Dualgan: Unsupervised dual learning
for image-to-image translation,” arXiv preprint, 2017.
[39] M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation
networks,” in NIPS, 2017.
[40] J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. A. Efros, and
T. Darrell, “Cycada: Cycle-consistent adversarial domain adaptation,” ICML,
2018.
[41] N. Inoue, R. Furuta, T. Yamasaki, and K. Aizawa, “Cross-domain weaklysupervised
object detection through progressive domain adaptation,” in CVPR,
2018.
[42] Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, “Domain adaptive faster
r-cnn for object detection in the wild,” in CVPR, 2018.
[43] J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and
E. Shechtman, “Toward multimodal image-to-image translation,” in NIPS,
2017.
[44] A. Almahairi, S. Rajeswar, A. Sordoni, P. Bachman, and A. Courville, “Augmented
cyclegan: Learning many-to-many mappings from unpaired data,” in
ICML, 2018.
[45] H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang, “Diverse
image-to-image translation via disentangled representations,” in ECCV, 2018.
[46] S.-W. Huang*, C.-T. Lin*, S.-P. Chen, Y.-Y. Wu, P.-H. Hsu, and S.-H. Lai,
“Auggan: Cross domain adaptation with gan-based data augmentation,” in
ECCV, 2018, *= Equal Contribution.
[47] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network
training by reducing internal covariate shift,” in ICML, 2015.
[48] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing
human-level performance on imagenet classification,” in ICCV, 2015.
[49] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning
with deep convolutional generative adversarial networks,” arXiv preprint
arXiv:1511.06434, 2015.
[50] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama,
and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,”
in Proceedings of the 22nd ACM international conference on Multimedia.
[51] Y. Xiang, W. Choi, Y. Lin, and S. Savarese, “Data-driven 3d voxel patterns for
object category recognition,” in CVPR, 2015.
[52] Y. Xiang, W. Choi, Y. Lin, and S. Savarese, “Subcategory-aware convolutional
neural networks for object proposals and detection,” in WACV, 2017.
[53] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving?
the kitti vision benchmark suite,” in CVPR, 2012.
[54] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The synthia
dataset: A large collection of synthetic images for semantic segmentation of
urban scenes,” in CVPR, 2016.
[55] S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: Ground
truth from computer games,” in ECCV, 2016.
[56] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell,
“Bdd100k: A diverse driving video database with scalable annotation tooling,”
arXiv preprint arXiv:1805.04687, 2018.
[57] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable
effectiveness of deep features as a perceptual metric,” in CVPR, 2018.
[58] A. Krizhevsky, “One weird trick for parallelizing convolutional neural networks,”
arXiv preprint arXiv:1404.5997, 2014.
[59] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic
segmentation,” in CVPR, 2015.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文