作者(外文):LI, Han-Kuang
論文名稱(外文):Graph-based Deep Convolution Network for 3D Human Pose Estimation
指導教授(外文):Lai, Shang-Hong
外文關鍵詞:HCIPose EstimationComputer Vision
Human pose estimation is an active research field, as it plays a significant role in HCI (human-computer interaction) related applications, such as augmented reality (AR), virtual reality (VR) and human action recognition.
Researchers aim to reconstruct human posture captured by the camera from the RGB images. Recent advances in deep learning have made great progress for human pose estimation with indoor motion-capture
datasets. Some of the methods utilize temporal information to improve the estimation results.
However, the structure of the human body skeleton is not fully exploited in the previous works, which includes valuable information such as joint inter-dependency and structural connectivity.
In this work, we propose a graph based convolution network to utilize spatial information in human skeleton for the estimation of 3D human pose.
We further include temporal convolution into the skeleton based graph to achieve spatial-temporal graph convolution. Feature computing through layers is accomplished at a joint level, instead of the traditional frame level.
With support from the skeleton graph, edge convolution is performed as a message passing scheme in the proposed model. The feature from each joint and its neighbors are aggregated by subtraction, which emphasizes the joint connectivity.
Our experimental results show that the proposed method provide competitive performance compared to other existing methods.
1 Introduction ...............................................1
1.1 Motivation. . . . . . . . . . . . . . . . . . . . . . . .1
1.2 Problem Statement. . . . . . . . . . . . . . . . . . . . 2
1.3 Contributions. . . . . . . . . . . . . . . . . . . . . . 3
2 Related Work ...............................................4
2.1 Lifting pose from 2D to 3D. . . . . . . . . . . . . . . .4
2.2 Multi-frame based pose estimation. . . . . . . . . . . . 4
2.3 Graph convolution. . . . . . . . . . . . . . . . . . . . 5
2.4 Our work. . . . . . . . . . . . . . . . . . . . . . . . .6
3 Proposed Network ...........................................8
3.1 Two-stream architecture for pose estimation. . . . . . . 8
3.2 Loopy Skeleton Graph. . . . . . . . . . . . . . . . . . .9
3.3 Edge Convolution. . . . . . . . . . . . . . . . . . . . .11
3.4 Temporal Edge Convolution. . . . . . . . . . . . . . . . 13
3.5 Overall Proposed Depth Net. . . . . . . . . . . . . . . .17
3.6 Data Normalization and Augmentation. . . . . . . . . . . 21
3.7 Objective function. . . . . . . . . . . . . . . . . . . .21
4 Experimental Results .......................................22
4.1 Quantitative Results. . . . . . . . . . . . . . . . . . .22
4.1.1 Experiment on Human3.6M. . . . . . . . . . . . . . . . 22
4.1.2 Experiment on HumanEva-I. . . . . . . . . . . . . . . .24
4.2 Ablation Study on Human3.6M. . . . . . . . . . . . . . . 25
4.3 Qualitative Results. . . . . . . . . . . . . . . . . . . 29
4.4 Discussion and Limitation. . . . . . . . . . . . . . . . 29
4.4.1 Computational complexity. . . . . . . . . . . . . . . .29
4.4.2 One-to-one joint mapping limited. . . . . . . . . . . .30
4.4.3 Ambiguity depths with overlapping. . . . . . . . . . . 31
5 Conclusions.................................................33
