帳號:guest(3.145.65.157)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):邱浩翰
作者(外文):Chiu, Hao-Han
論文名稱(中文):基於高效輕量化神經網路之人體節點估測
論文名稱(外文):Human Keypoints Detection Based On Effective Lightweight Neural Network
指導教授(中文):林嘉文
指導教授(外文):Lin, Chia-Wen
口試委員(中文):黃敬群
康立威
鄭旭詠
口試委員(外文):Huang, Ching-Chun
Kang, Li-Wei
CHENG, HSU-YUNG
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:105061607
出版年(民國):108
畢業學年度:107
語文別:中文
論文頁數:36
中文關鍵詞:輕量化神經網路人體姿態估計人體節點估測
外文關鍵詞:Lightweight Neural NetworkHuman Keypoints DetectionHuman Pose Estimation
相關次數:
  • 推薦推薦:0
  • 點閱點閱:464
  • 評分評分:*****
  • 下載下載:20
  • 收藏收藏:0
近年來,隨著深度學習 (Deep Learning) 網路的蓬勃發展,人體姿態估計
(Human Pose Estimation)在電腦視覺辨識的領域中,一直都是相當熱門的研究,
配合著電腦硬體(GPU)的效能提升,這個項目的準確率與實時性也達到新的里
程碑。
目前人體姿態估計大致上可分為兩種做法:(1)Top-down 是先在圖片中找
出人的位置,接著再去計算人的姿勢,或是(2)Bottom-up 直接看圖中有哪些人
體節點(Keypoints)接著再將這些關節正確地連接起來,兩種做法皆是使用深度
捲積式神經網路(Deep-CNN),而使用這類的神經網路時,總會陷入使用更深層
的網路才能得到更好效能的困境,伴隨著網路的深度增加,也就意味著所需參
數量(Weight)的巨幅增加,但運用此方式在較缺乏運算能力的裝置(如:手機、
監視器、行車紀錄器)上,實現上述做法有一定的難度。
本篇論文提出了一種新穎的輕量網路架構“Shuffle Hourglass Network”
(SHN) 來進行人體關節偵測,從實驗在幾個主流資料庫的分析結果來看,我們
的方法在減少大量龐雜的參數同時,亦可維持與以往做法相當的準確率,而且
我們這種網路架構,理論上可套用於大多使用沙漏式網路結構的人體姿態估計
方法。
In recent years, with the rapid development of Deep Learning networks, Human Pose Estimation has been a very popular research in the field of computer vision recognition.
With performance improvement of computer hardware (GPU), its accuracy and real-time application also reached new milestones.
At present, human pose estimation can be roughly divided into two methods: 1) Topdown, find the position of the person in the picture, then calculate the person's posture. 2) Bottom-up, directly see which human keypoints are in the picture, then connect these points correctly. Both of these methods is using Deep Convolutional Neural Networks (Deep-CNN). When using this kind of neural network, you will always fall into the dilemma of using a deeper network to get better performance. The deeper of the neural network, the more parameters (Weight) required. However, it is difficult to implement this method on devices that lack computing power, such as mobile phones, monitors, and driving recorders, etc.
This paper proposes a novel lightweight network architecture called Shuffle Hourglass Network (SHN) which for human joint detection. From the analysis results of several mainstream databases, our method can maintain a high degree of accuracy
compared with the previous practice while reducing a large number of complex parameters.
Moreover, our network architecture can theoretically be applied to Human Pose Estimation methods that mostly are with hourglass network structures.
摘 要............................................................................................................. 3
Abstract ................................................................................................................. 4
Content .................................................................................................................. 5
Chapter 1 ............................................................................................................... 6
Introduction ........................................................................................... 6
1.1 Research Background .................................................................. 6
1.2 Motivation and Objective ............................................................ 7
1.3 Thesis Organization ..................................................................... 8
Chapter 2 ............................................................................................................... 9
Related Work ......................................................................................... 9
2.1 Human Pose Estimation ............................................................... 9
2.2 Lightweight Neural Network ..................................................... 13
Chapter 3 ............................................................................................................. 15
Proposed Method ................................................................................ 15
3.1 Overview .................................................................................... 15
3.2 Network Architecture ................................................................. 16
3.3 Optimization .............................................................................. 19
Chapter 4 ............................................................................................................. 21
Experiments and Discussions ............................................................. 21
4.1 Experimental Setup .................................................................... 21
4.2 Evaluation .................................................................................. 24
4.3 Comparison ................................................................................ 27
Chapter 5 ............................................................................................................. 32
Conclusion .......................................................................................... 32
Reference ............................................................................................................ 33
[1] S. Liu, X. Liang, L. Liu, X. Shen, J. Yang, C. Xu, L. Lin, X. Cao, and S. Yan.
Matching-cnn meets knn: Quasiparametric human parsing. In CVPR, 2015.
[2] K. Yamaguchi, M. H. Kiapour, L. E. Ortiz, and T. L. Berg. Parsing clothing in
fashion photographs. In CVPR, 2012.
[3] W. Yang, P. Luo, and L. Lin. Clothing co-parsing by joint image segmentation and
labeling. In CVPR, 2014.
[4] C. Wang, Y. Wang, and A. L. Yuille. An approach to posebased action recognition.
In CVPR, 2013.
[5] W. Yang, Y. Wang, and G. Mori. Recognizing human actions from still images with
latent poses. In CVPR, 2010.
[6] L. Zheng, Y. Huang, H. Lu, and Y. Yang. Pose invariant embedding for deep person
re-identification. arXiv preprint arXiv:1701.07732, 2017.
[7] Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network
for multi-person pose estimation. In CVPR, 2018
[8] H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu. Rmpe: Regional multi-person pose
estimation. In ICCV, Oct 2017.
[9] K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask R-CNN. arXiv preprint
arXiv:1703.06870, 2017.
[10] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh. Real-time multi-person 2d pose
estimation using part affinity fields. In CVPR, 2017.
[11] A. Newell, Z. Huang, and J. Deng. Associative Embedding: End-to-End Learning
for Joint Detection and Grouping. ArXiv e-prints, Nov. 2016.
[12] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.
In CVPR, June 2016.
[13] X. Chen and A. Yuille. Articulated pose estimation by a graphical model with
image dependent pairwise relations. Eprint Arxiv, pages 1736–1744, 2014.
[14] M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People
detection and articulated pose estimation. In CVPR, 2009.
[15] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to
document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[16] M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele. 2d human pose estimation:
New benchmark and state of the art analysis. In CVPR, 2014.
[17] T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollr, and C.
L. Zitnick. Microsoft coco: Common objects in context.
[18] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose
estimation. In European Conference on Computer Vision, pages 483–499, 2016.
[19] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient
convolutional neural network for mobile devices. arXiv preprint arXiv:1707.01083
(2017)
[20] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. arXiv
preprint (2016)
[21] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T.,
Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for
mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
[22] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,D. Anguelov, D. Erhan, V.
Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015.
[23] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer.
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model
size. arXiv preprint arXiv:1602.07360, 2016.
[24] P. F. Felzenszwalb and D. P. Huttenlocher. Pictorial structures for object
recognition. IJCV, 61(1):55–79, 2005.
[25] Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-ofparts.
In CVPR, 2011.
[26] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In
CVPR, 2005.
[27] L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M. Andriluka, P. Gehler, and B.
Schiele. Deepcut: Joint subset partition and labeling for multi person pose estimation.
In Computer Vision and Pattern Recognition, pages 4929–4937, 2016.
[28] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele. Deepercut:
A deeper, stronger, and faster multiperson pose estimation model. In European
Conference on Computer Vision, pages 34–50, 2016.
[29] Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for
human pose estimation. In: IEEE International Conference on Computer Vision (2017)
[30] T.-Y. Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature
pyramid networks for object detection. In CVPR, 2017.
[31] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object
detection with region proposal networks. In NIPS, 2015.
[32] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed. SSD: Single Shot Multi-
Box Detector. In European Conference on Computer Vision (ECCV), 2016.
[33] T. Kong, A. Yao, Y. Chen, and F. Sun. Hypernet: Towards accurate region proposal
generation and joint object detection.In Computer Vision and Pattern Recognition,
pages 845–853, 2016.
[34] S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He. Aggregated residual
transformations for deep neural networks. In CVPR, 2017.
[35] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the
inception architecture for computer vision. In CVPR, 2016.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *