帳號:guest(18.218.8.152)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):張文彥
作者(外文):Chang, Wen-Yen
論文名稱(中文):應用變分自編碼器強化物件偵測的資料篩選效能
論文名稱(外文):Enhance data selection efficiency with variational auto-encoder for object detection’s active learning
指導教授(中文):孫民
指導教授(外文):Sun, Min
口試委員(中文):王鈺強
陳煥宗
口試委員(外文):Wang, Yu-Chiang
Chen, Hwann-Tzong
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:106061705
出版年(民國):109
畢業學年度:108
語文別:英文
論文頁數:60
中文關鍵詞:物件偵測自主學習電腦視覺變分解碼器
外文關鍵詞:Object DetectionActive LearningComputer VisionVariational Auto-encoder
相關次數:
  • 推薦推薦:0
  • 點閱點閱:411
  • 評分評分:*****
  • 下載下載:32
  • 收藏收藏:0
對一次選多張影像之物件偵測自主學習在監視影片環境中,我們的方法應用變分自編碼器強化其資料篩選的多樣化特性。
相比於多樣性篩選以及不穩定性篩選,我們的混合型篩選策略在各種環境都具有穩定的表現:我們仰賴不穩定性篩選策略對影像的評分方式,但是我們會動態的調整評分的權重來避免相似資料篩選帶來得不必要花費,首先先藉由K-means 聚類法將變分自編碼器描述的影像分布取得相似影像的假設性標註,再藉由假設性標註以及篩選過的圖片,調輕與被選過的資料同類別的篩選權重,反覆上述步驟直到選取定量的影像進行標註後,我們會加入訓練集來訓練我們的物件偵測模型。我們實驗在四種不同環境以測定我們的混合策略是有效且強健的,並且給予各種方法對於各種環境的適用性比較及使用建議。透過我們的方法可以加速物件偵測系統的建置以及資料的收集在監視器上的應用,在多數環境下我們可以僅使用30%資料訓練模型取得完整資料集訓練模型的90%表現。
We apply pool-based active learning on object detection with surveillance video. The pool-based needs to select one batch of images, which have a budget limit in each selection iteration. Our method utilizes the VAE to enhance the diversity property of the selection strategy. Comparing with uncertainty and diversity selection, our method (hybrid strategy) have robust performance in different environments: Our method relies on uncertainty selection strategy to score image, which is more valuable for labeling. Moreover, we dynamic re-weight the uncertainty score of the image to avoid selecting similar data, which causes the redundant information for object detection model. First, we cluster the latent space of VAE by k-means in order to get similar data pseudo-label. Second, we re-weight uncertainty scores of similar images by the number of selected images with the same pseudo-label. Third, we select the most informative image for annotator labeling, which has the top-1 high re-weighted uncertainty score. Then we select data iteratively following the above steps until reaching the budget limited of the one batch of images. In the end, we add the batch of images as the object detector's training data. We do four experiments to validate that data selection in our method is more efficient and robust. Besides, we organize the recommendation usage of each method in different environments. Finally, we can accelerate the surveillance system build-up time and the data collection through our method. In most environments, we can only use the 30% data to achieve a competitive model 90% performance with the entire dataset.
Declaration ii
誌謝 iii
Acknowledgements iv
摘要 v
Abstract vi
1 Introduction 1
2 Related Work 5
2.1 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Anchor-design: Anchor-free v. s. Anchor-based . . . . . . . . . 6
2.1.2 Detector Architecture Selection : One/Two-stage Detector . . . 6
2.2 Image Uncertainty Estimation . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Valuable Objects Estimation . . . . . . . . . . . . . . . . . . . 7
2.2.2 Valuable Image Estimation . . . . . . . . . . . . . . . . . . . . 9
2.3 Image Diversity Estimation . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Method 11
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Uncertainty Re-weighting . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.2 Image Uncertainty Estimation . . . . . . . . . . . . . . . . . . 15
3.2.3 Similar Image Clustering . . . . . . . . . . . . . . . . . . . . . 16
3.2.4 Uncertainty Re-weighting . . . . . . . . . . . . . . . . . . . . 18
4 Experiments 20
4.1 Experiments Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1.2 Classification Setting . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.3 Object Detection Setting . . . . . . . . . . . . . . . . . . . . . 23
4.2 Proposed Methods Comparison . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.1 Uncertainty Ablation Study . . . . . . . . . . . . . . . . . . . 26
4.3.2 Diversity Ablation Study . . . . . . . . . . . . . . . . . . . . . 28
4.4 Quantitative Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.1 Quantitative comparison with AUC of the mAP-labeled . . . . 32
4.4.2 Quantitative comparison with curve of mAP-labeled . . . . . . 33
4.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5 Qualitative Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.5.1 VAE Clustering Property . . . . . . . . . . . . . . . . . . . . . 39
4.5.2 Object Detection Visualize . . . . . . . . . . . . . . . . . . . . 40
5 Conclusion 43
6 Future Work 44
A Method 45
A.1 Heapified Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
A.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
A.1.2 Observation Feature Designing . . . . . . . . . . . . . . . . . . 46
A.1.3 Heapified Policy . . . . . . . . . . . . . . . . . . . . . . . . . 50
B Dataset Statistic 53
C Failure Case Analysis 54
C.1 Uncertainty selection failure case . . . . . . . . . . . . . . . . . . . . . 54
C.2 The effect of slight-ego-motion for motion selection . . . . . . . . . . . 55
References 57
[1] W.-Y. Chang, W.-H. Chiang, S.-H. Lu, T. Wu, and M. Sun, “Bias-aware heapified policy for active learning,” 2019. ii
[2] H.-N. Hu, Q.-Z. Cai, D. Wang, J. Lin, M. Sun, P. Krähenbühl, T. Darrell, and F. Yu, “Joint monocular 3d vehicle detection and tracking,” in IEEE International Conference on Computer Vision (ICCV), 2019. 1
[3] T.-H. Wang, H.-N. Hu, C. H. Lin, Y.-H. Tsai, W.-C. Chiu, and M. Sun, “3d lidar and stereo fusion using stereo matching network with conditional cost volume normalization,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019. 1
[4] P.-Y. Huang, W.-T. Hsu, C.-Y. Chiu, T.-F. Wu, and M. Sun, “Efficient uncertainty estimation for semantic segmentation in videos,” in European Conference on Computer Vision (ECCV), 2018. 1
[5] N. E. M. Khalifa, M. H. N. Taha, A. E. Hassanien, and S. Elghamrawy, “Detection of coronavirus (covid-19) associated pneumonia based on generative adversarial networks and a fine-tuned deep transfer learning model using chest x-ray dataset,” 2020. 1
[6] A. Abbas, M. M. Abdelsamea, and M. M. Gaber, “Classification of covid-19 in chest x-ray images using detrac deep convolutional neural network,” 2020. 1
[7] L. Wang and A. Wong, “Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest radiography images,” 2020. 1
[8] H. S. Maghdid, K. Z. Ghafoor, A. S. Sadiq, K. Curran, and K. Rabie, “A novel aienabled framework to diagnose coronavirus covid 19 using smartphone embedded sensors: Design study,” 2020. 1
[9] H. H. Aghdam, A. Gonzalez-Garcia, J. van de Weijer, and A. M. Lopez, “Active learning for deep detection neural networks,” in IEEE International Conference on Computer Vision (ICCV), 2019. 1, 2, 3, 4, 7, 8, 9, 31
[10] R. Panda, A. Bhuiyan, V. Murino, and A. K. Roy-Chowdhury, “Unsupervised adaptive re-identification in open world dynamic camera networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1377–1386, 2017. 2
[11] L. Qi, L. Wang, J. Huo, L. Zhou, Y. Shi, and Y. Gao, “A novel unsupervised camera-aware domain adaptation framework for person re-identification,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8079–8088, 2019. 2
[12] E. Romera, L. M. Bergasa, K. Yang, J. M. Alvarez, and R. Barea, “Bridging the day and night domain gap for semantic segmentation,” in 2019 IEEE Intelligent Vehicles Symposium (IV), pp. 1312–1318, 2019.
[13] L. Sun, K. Wang, K. Yang, and K. Xiang, “See clearer at night: Towards robust nighttime semantic segmentation through day-night image conversion,” arXiv preprint arXiv:1908.05868, 2019. 2
[14] S.-W. Huang, C.-T. Lin, S.-P. Chen, Y.-Y. Wu, P.-H. Hsu, and S.-H. Lai, “Auggan: Cross domain adaptation with gan-based data augmentation,” in The European Conference on Computer Vision (ECCV), September 2018. 2
[15] C.-A. Brust, C. K. adin, and J. Denzler, “Active learning for deep detection neural networks,” in Computer Vision Theory and Applications (VISAPP), 2019. 2, 3, 7, 9, 31
[16] S. Roy, A. Unmesh, and V. P. Namboodiri, “Deep active learning for object detection,” in British Machine Vision Conference (BMVC), 2018. 2, 3, 7, 8, 9
[17] C.-C. Kao, T.-Y. Lee, P. Sen, and M.-Y. Liu, “Localization-aware active learning for object detection,” in Asian Conference on Computer Vision (ACCV), 2018. 2, 3, 7, 8, 9
[18] S. V. Desai, A. L. Chandra, W. Guo, S. Ninomiya, and V. N. Balasubramanian, “An adaptive supervision framework for active learning in object detection,” in British Machine Vision Conference (BMVC), 2019. 2, 3, 7
[19] S. Sinha, S. Ebrahimi, and T. Darrell, “Variational adversarial active learning,” in IEEE International Conference on Computer Vision (ICCV), 2019. 3, 4, 9, 10, 31
[20] Z. Liu, T. Zheng, G. Xu, Z. Yang, H. Liu, and D. Cai, “Training-time-friendly network for real-time object detection,” arXiv preprint arXiv:1909.00700, 2019. 6
[21] S. Zhang, C. Chi, Y. Yao, Z. Lei, and S. Z. Li, “Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 6
[22] J. Huang, V. Rathod, C. Sun, M. Zhu, A. K. Balan, A. Fathi, I. C. Fischer, Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy, “Speed/accuracy trade-offs for modern convolutional object detectors,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3296–3297, 2017. 6
[23] Z. Xu, R. Akella, and Y. Zhang, “Incorporating diversity and density in active learning for relevance feedback,” in Advances in Information Retrieval, 2007. 9
[24] O. Sener and S. Savarese, “Active learning for convolutional neural networks: a core-set approach,” in International Conference on Learning Representations (ICLR), 2018. 9, 10
[25] D. Gissin and S. Shalev-Shwartz, “Discriminative active learning,” 2019. 9, 10, 49
[26] S. Ebert, M. Fritz, and B. Schiele, “RALF: A reinforced active learning formulation for object class recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. 9, 10, 12
[27] Y. Gu, Z. Jin, and S. C. Chiu, “Active learning combining uncertainty and diversity for multi-class image classification,” IET Computer Vision, 2015. 9, 10
[28] L. Smith and Y. Gal, “Understanding measures of uncertainty for adversarial example detection,” in UAI, 2018. 12
[29] C. E. Shannon, “A mathematical theory of communication,” SIGMOBILE Mob. Comput. Commun. Rev., 2001. 15, 31
[30] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, 1998. 23
[31] Y. LeCun and C. Cortes, “MNIST handwritten digit database,” Proceedings of the IEEE, vol. 86, pp. 2278–2324, nov 2010. 23
[32] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), 2015. 23
[33] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C. Berg, “SSD: single shot multibox detector,” in Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I (B. Leibe, J. Matas, N. Sebe, and M. Welling, eds.), vol. 9905 of
Lecture Notes in Computer Science, pp. 21–37, Springer, 2016. 23, 24
[34] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” in International Conference on Learning Representations, 2015. 23
[35] T.-Y. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” ArXiv, vol. abs/1405.0312, 2014. 24
[36] O. Bachem, M. Lucic, and A. Krause, “Practical coreset constructions for machine learning,” ArXiv, 2017. 31
[37] Y. Gal, R. Islam, and Z. Ghahramani, “Deep bayesian active learning with image data,” in International Conference on Machine Learning (ICML), 2017. 48
[38] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision (IJCV), 2004. 50
[39] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (surf),” Comput. Vis. Image Underst., 2008. 50
[40] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005. 50
[41] S. Chandra, S. Kumar, and C. V. Jawahar, “Learning Hierarchical Bag of Words Using Naive Bayes Clustering,” in Asian Conference on Computer Vision (ACCV), 2012. 50
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *