帳號:guest(3.144.237.96)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):欒俊
作者(外文):Luan, Jun
論文名稱(中文):基於混合深度架構的行人偵測
論文名稱(外文):Hybrid Deep Architecture for Pedestrian Detection
指導教授(中文):賴尚宏
劉庭祿
指導教授(外文):Lai, Shang-Hong
Liu, Tyng-Luh
口試委員(中文):孫民
陳煥宗
口試委員(外文):Sun, Min
Chen, Hwann-Tzong
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:102062467
出版年(民國):104
畢業學年度:103
語文別:英文
論文頁數:41
中文關鍵詞:行人偵測卷積神經網路混合深度架構
外文關鍵詞:Pedestrian DetectionCNNHybrid Deep Architecture
相關次數:
  • 推薦推薦:0
  • 點閱點閱:498
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
在本篇論文中,我們提出了一個混合卷積神經網路 (CNN) - 分類玻爾茲曼機(ClassRBM) 的模型,並應用於行人偵測問題。雖然深度網路的方法在辨識和一般物體偵測的問題下取得了巨大的突破,但是在行人偵測問題上,這類方法並沒有清晰地顯示出它的優越性,並且和當今最好的特徵池加強分類器的方法有一定的差距。我們利用預先訓練好的 AlexNet 網路,並在行人的數據集上做精細調整訓練,加上仔細訓練的分類玻爾茲曼機,在行人偵測的數據集上取得了很好的結果。我們的模型同步的提取了局部特徵,並且利用它們通過多層網路來提取高層次和全局的特徵。分類玻爾茲曼機將高層特徵轉換成最終的幾率分佈。我們利用物件位置回歸加取樣的方法來解決由低質量候選物件引起的定位問題。我們的實驗從不同方面展現出了深度網路在行人偵測上的成功應用。
In this thesis we propose a hybrid convolutional neural network (CNN)-classification Restricted Boltzmann Machine (ClassRBM) model for the task of pedestrian detection. Although deep-net approaches have been shown to be successful in tackling recognition and general object detection problems, its success in pedestrian detection is not clear and not competitive with the state-of-the-art feature pools plus boosted decision trees method. We integrate a fine-tuned AlexNet with a carefully-trained ClassRBM to achieve competitive performances in the INRIA and Caltech pedestrian dataset. The model jointly extracts local features and further processes them through multiple layers to extract high-level and global features. The top-layer ClassRBM performs inference from CNN features and outputs classification results as a probability distribution. An additional bounding-box regression with sampling method is employed for addressing the localization problem caused by low-quality region proposals. Our experiments demonstrate the successful results of deep net for pedestrian detection in many aspects.
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Main Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Previous Works 5
2.1 Traditional Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Deep Network Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Region Proposals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Proposed Hybrid Deep Architecture 10
3.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Deep CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Classification RBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Bounding Box Regression . . . . . . . . . . . . . . . . . . . . . . . . 18
3.5 Merge and Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.6 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Experiments 22
4.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Experiment Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Different Region Proposals . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Optimal ClassRBM Parameters . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Different Layer Features . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.6 Box Regression and Sampling . . . . . . . . . . . . . . . . . . . . . . 28
4.7 Fine-tune The Network . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.8 Training Data’s Selection . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.9 Transfer Learning Ability . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 Conclusion 36
References 37
[1] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV,
2004.
[2] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,”
in CVPR, 2005.
[3] P. Felzenszwalb, D. McAllester, and D. Ramanan, “A discriminatively trained,
multiscale, deformable part model,” in CVPR, 2008.
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
convolutional neural networks,” in NIPS, 2012.
[5] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A largescale hierarchical image database,” in CVPR, 2009.
[6] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for
accurate object detection and semantic segmentation,” in CVPR, 2014.
[7] P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: An evaluation of the state of the art,” PAMI, 2012.
[8] R. Benenson, M. Omran, J. Hosang, and B. Schiele, “Ten years of pedestrian detection, what have we learned?,” in ECCV, CVRSUAD workshop, 2014.
38
[9] A. Ess, B. Leibe, K. Schindler, and L. Van Gool, “A mobile vision system for
robust multi-person tracking,” in CVPR, 2008.
[10] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the
kitti vision benchmark suite,” in CVPR, 2012.
[11] P. Dollár, Z. Tu, P. Perona, and S. Belongie, “Integral channel features.,” in BMVC,
2009.
[12] J. Hosang, M. Omran, R. Benenson, and B. Schiele, “Taking a deeper look at
pedestrians,” CVPR, 2015.
[13] R. Benenson, M. Mathias, T. Tuytelaars, and L. Van Gool, “Seeking the strongest
rigid detector,” in CVPR, 2013.
[14] P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun, “Pedestrian detection
with unsupervised multi-stage feature learning,” in CVPR, 2013.
[15] W. Ouyang and X. Wang, “A discriminative deep model for pedestrian detection
with occlusion handling,” in CVPR, 2012.
[16] W. Ouyang, X. Zeng, and X. Wang, “Modeling mutual visibility relationship in
pedestrian detection,” in CVPR, 2013.
[17] W. Ouyang and X. Wang, “Joint deep learning for pedestrian detection,” in ICCV,
2013.
[18] P. Luo, Y. Tian, X. Wang, and X. Tang, “Switchable deep network for pedestrian
detection,” in CVPR, 2014.
39
[19] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Computer Science Department, University of Toronto, Tech. Rep, 2009.
[20] J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders, “Selective
search for object recognition,” IJCV, 2013.
[21] B. Alexe, T. Deselaers, and V. Ferrari, “What is an object?,” in CVPR, 2010.
[22] C. L. Zitnick and P. Dollár, “Edge boxes: Locating object proposals from edges,”
in ECCV, 2014.
[23] M.-M. Cheng, Z. Zhang, W.-Y. Lin, and P. Torr, “Bing: Binarized normed gradients
for objectness estimation at 300fps,” in CVPR, 2014.
[24] W. Ouyang and X. Wang, “Single-pedestrian detection aided by multi-pedestrian
detection,” in CVPR, 2013.
[25] X. Zeng, W. Ouyang, and X. Wang, “Multi-stage contextual deep learning for
pedestrian detection,” in ICCV, 2013.
[26] Z. Zhang, J. Warrell, and P. H. Torr, “Proposal generation for object detection using
cascaded ranking svms,” in CVPR, 2011.
[27] G. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief
nets,” Neural computation, 2006.
[28] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama,
and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,”
arXiv preprint arXiv:1408.5093, 2014.
40
[29] H. Larochelle and Y. Bengio, “Classification using discriminative restricted boltzmann machines,” in ICML, 2008.
[30] P. Dollár, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: A benchmark,” in CVPR, 2009.
[31] W. Nam, P. Dollár, and J. H. Han, “Local decorrelation for improved pedestrian
detection,” in NIPS, 2014.
(此全文限內部瀏覽)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *