帳號:guest(18.118.151.12)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):李函光
作者(外文):LI, Han-Kuang
論文名稱(中文):基於圖形架構的深度捲積網路應用於三維空間人體姿態估測
論文名稱(外文):Graph-based Deep Convolution Network for 3D Human Pose Estimation
指導教授(中文):賴尚宏
指導教授(外文):Lai, Shang-Hong
口試委員(中文):鄭嘉珉
陳煥宗
江振國
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:106062645
出版年(民國):108
畢業學年度:108
語文別:英文
論文頁數:43
中文關鍵詞:人機互動姿態估測計算機視覺
外文關鍵詞:HCIPose EstimationComputer Vision
相關次數:
  • 推薦推薦:0
  • 點閱點閱:236
  • 評分評分:*****
  • 下載下載:26
  • 收藏收藏:0
人體姿態估測一直是個熱門的研究領域,由於它在人機互動相關的應用中有著重要的關聯性,像是虛擬實境、擴增實境及動作辨識。
研究團隊們嘗試自圖像中重建由攝影機捕捉到的人體姿態。其中有多數提出了基於深度學習的方法,可以在室內姿態捕捉的資料集上達到優秀的姿態重建表現,而當中有方法甚至延用了影片中時間序列的連續性資訊來輔助重建。
然而,人體的骨骼架構卻很少被參考,這當中包含了珍貴的資訊:關節間的交互關係及結構連接性。
在這篇論文中,我們將設計基於圖形的捲積網路來探討人體的架構資訊是如何幫助深度的預測。
我們將時間序列的捲積層延伸到區域性與時間性兼具的捲積層。特徵的計算上不再只是以影片中的偵為單位,而是以各個偵中的各人體關節為單位做計算。有著由圖形所呈現的人體架構,以邊為基準的圖形捲積運算將在我們的方法中作為資訊在圖形中傳遞的方式。各關節的特徵及其與相連的關節的特徵差在各捲積層中匯集。
從我們的實驗表現中可見,相較於近期發表的其他論文,我們所提出的方法可以得到具競爭力的姿態重建表現。
Human pose estimation is an active research field, as it plays a significant role in HCI (human-computer interaction) related applications, such as augmented reality (AR), virtual reality (VR) and human action recognition.
Researchers aim to reconstruct human posture captured by the camera from the RGB images. Recent advances in deep learning have made great progress for human pose estimation with indoor motion-capture
datasets. Some of the methods utilize temporal information to improve the estimation results.
However, the structure of the human body skeleton is not fully exploited in the previous works, which includes valuable information such as joint inter-dependency and structural connectivity.
In this work, we propose a graph based convolution network to utilize spatial information in human skeleton for the estimation of 3D human pose.
We further include temporal convolution into the skeleton based graph to achieve spatial-temporal graph convolution. Feature computing through layers is accomplished at a joint level, instead of the traditional frame level.
With support from the skeleton graph, edge convolution is performed as a message passing scheme in the proposed model. The feature from each joint and its neighbors are aggregated by subtraction, which emphasizes the joint connectivity.
Our experimental results show that the proposed method provide competitive performance compared to other existing methods.
1 Introduction ...............................................1
1.1 Motivation. . . . . . . . . . . . . . . . . . . . . . . .1
1.2 Problem Statement. . . . . . . . . . . . . . . . . . . . 2
1.3 Contributions. . . . . . . . . . . . . . . . . . . . . . 3
2 Related Work ...............................................4
2.1 Lifting pose from 2D to 3D. . . . . . . . . . . . . . . .4
2.2 Multi-frame based pose estimation. . . . . . . . . . . . 4
2.3 Graph convolution. . . . . . . . . . . . . . . . . . . . 5
2.4 Our work. . . . . . . . . . . . . . . . . . . . . . . . .6
3 Proposed Network ...........................................8
3.1 Two-stream architecture for pose estimation. . . . . . . 8
3.2 Loopy Skeleton Graph. . . . . . . . . . . . . . . . . . .9
3.3 Edge Convolution. . . . . . . . . . . . . . . . . . . . .11
3.4 Temporal Edge Convolution. . . . . . . . . . . . . . . . 13
3.5 Overall Proposed Depth Net. . . . . . . . . . . . . . . .17
3.6 Data Normalization and Augmentation. . . . . . . . . . . 21
3.7 Objective function. . . . . . . . . . . . . . . . . . . .21
4 Experimental Results .......................................22
4.1 Quantitative Results. . . . . . . . . . . . . . . . . . .22
4.1.1 Experiment on Human3.6M. . . . . . . . . . . . . . . . 22
4.1.2 Experiment on HumanEva-I. . . . . . . . . . . . . . . .24
4.2 Ablation Study on Human3.6M. . . . . . . . . . . . . . . 25
4.3 Qualitative Results. . . . . . . . . . . . . . . . . . . 29
4.4 Discussion and Limitation. . . . . . . . . . . . . . . . 29
4.4.1 Computational complexity. . . . . . . . . . . . . . . .29
4.4.2 One-to-one joint mapping limited. . . . . . . . . . . .30
4.4.3 Ambiguity depths with overlapping. . . . . . . . . . . 31
5 Conclusions.................................................33
References....................................................34
[1] Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).
[2] Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., and Sheikh, Y. Openpose: real-time multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1812.08008 (2018).
[3] Carreira, J., Agrawal, P., Fragkiadaki, K., and Malik, J. Human pose estima-tion with iterative error feedback. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 4733–4742.
[4] Chen, C.-H., and Ramanan, D. 3d human pose estimation= 2d pose estima-tion+ matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 7035–7043.
[5] Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., and Sun, J. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition (2018), pp. 7103–7112.
[6] Fang, H.-S., Xie, S., Tai, Y.-W., and Lu, C. Rmpe: Regional multi-person pose estimation. In Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 2334–2343.
[7] Fang, H.-S., Xu, Y., Wang, W., Liu, X., and Zhu, S.-C. Learning pose grammar to encode human body configuration for 3d pose estimation. In Thirty-Second AAAI Conference on Artificial Intelligence (2018).
[8] He, K., Gkioxari, G., Dollár, P., and Girshick, R. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (2017), pp. 2961–2969.
[9] Lee, K., Lee, I., and Lee, S. Propagating lstm: 3d pose estimation based on joint interdependency. In Proceedings of the European Conference on Com-puter Vision (ECCV) (2018), pp. 119–135.
[10] Li, S., and Chan, A. B. 3d human pose estimation from monocular images with deep convolutional neural network. In Asian Conference on Computer Vision (2014), Springer, pp. 332–347.
[11] Lin, M., Lin, L., Liang, X., Wang, K., and Cheng, H. Recurrent 3d pose se-quence machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 810–819.
[12] Luvizon, D. C., Picard, D., and Tabia, H. 2d/3d pose estimation and action recognition using multitask deep learning. In Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition (2018), pp. 5137–5146.
[13] Martinez, J., Hossain, R., Romero, J., and Little, J. J. A simple yet effective baseline for 3d human pose estimation. In Proceedings of the IEEE Interna-tional Conference on Computer Vision (2017), pp. 2640–2649.
[14] Newell, A., Yang, K., and Deng, J. Stacked hourglass networks for human pose estimation. In European conference on computer vision (2016), Springer, pp. 483–499.
[15] Niepert, M., Ahmed, M., and Kutzkov, K. Learning convolutional neural net-works for graphs. In International conference on machine learning (2016), pp. 2014–2023.
[16] Pavlakos, G., Zhou, X., and Daniilidis, K. Ordinal depth supervision for 3d human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 7307–7316.
[17] Pavlakos, G., Zhou, X., Derpanis, K. G., and Daniilidis, K. Coarse-to-fine volumetric prediction for single-image 3d human pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 7025–7034.
[18] Pavllo, D., Feichtenhofer, C., Grangier, D., and Auli, M. 3d human pose esti-mation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-nition (2019), pp. 7753–7762.
[19] Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler,
P. V., and Schiele, B. Deepcut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 4929–4937.
[20] Rayat Imtiaz Hossain, M., and Little, J. J. Exploiting temporal information for 3d human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV) (2018), pp. 68–84.
[21] Sigal, L., Balan, A. O., and Black, M. J. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International journal of computer vision 87, 1-2 (2010), 4.
[22] Sun, K., Xiao, B., Liu, D., and Wang, J. Deep high-resolution representation learning for human pose estimation. arXiv preprint arXiv:1902.09212 (2019).
[23] Sun, X., Shang, J., Liang, S., and Wei, Y. Compositional human pose re-gression. In Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 2602–2611.
[24] Tekin, B., Márquez-Neila, P., Salzmann, M., and Fua, P. Learning to fuse 2d and 3d image cues for monocular body pose estimation. In Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 3941–3950.
[25] Tompson, J., Goroshin, R., Jain, A., LeCun, Y., and Bregler, C. Efficient object localization using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 648–656.
[26] Toshev, A., and Szegedy, C. Deeppose: Human pose estimation via deep neu-ral networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (2014), pp. 1653–1660.
[27] Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M., and Solomon,
J. M. Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829 (2018).
[28] Wei, S.-E., Ramakrishna, V., Kanade, T., and Sheikh, Y. Convolutional pose machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 4724–4732.
[29] Wu, E., and Koike, H. Futurepose-mixed reality martial arts training using real-time 3d human pose forecasting with a rgb camera. In 2019 IEEE Win-ter Conference on Applications of Computer Vision (WACV) (2019), IEEE, pp. 1384–1392.
[30] Xiao, B., Wu, H., and Wei, Y. Simple baselines for human pose estimation and tracking. In Proceedings of the European Conference on Computer Vision (ECCV) (2018), pp. 466–481.
[31] Yan, S., Xiong, Y., and Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-Second AAAI Conference on Artificial Intelligence (2018).
[32] Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., and Wang, X. 3d human pose estimation in the wild by adversarial learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 5255–5264.
[33] Zhang, H., Ouyang, H., Liu, S., Qi, X., Shen, X., Yang, R., and Jia, J. Human pose estimation with spatial contextual information. arXiv preprint arXiv:1901.01760 (2019).
[34] Zhang, X., Xu, C., Tian, X., and Tao, D. Graph edge convolutional neural net-works for skeleton based action recognition. arXiv preprint arXiv:1805.06184 (2018).
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *