作者(外文):Chen, Jia-Xin
論文名稱(外文):Dance Generation from Audio
指導教授(外文):Chen, Hwann-Tzong
口試委員(外文):Lee, Che-Rung
Lin, Yen-Yu
外文關鍵詞:cross-modalitylatent representationchoreography
This thesis proposes a new learning-based method to generate dance poses from given music clips. Prior approaches often address the choreography generation tasks using models that comprise recurrent networks or transformers and thus make the tasks hardware-demanding and time-consuming. We propose a network architecture that uses convolution layers to explore the extent of lightweight approaches. The experimental results on video in the wild provide a baseline of several beat-related indices and a new self-similarity metric on dance sequence generation and validate the effectiveness of our method.
1 Introduction . . . . .7
2 Related Work . . . . .9
2.1 Retrieval-Based Choreography Generation . . . . .9
2.2 Adversarial Learning-Based Choreography Generation . . . . .10
2.3 Sequence-to-Sequence Choreography Generation . . . . .10
3 Proposed Approach . . . . .12
3.1 Problem Definition . . . . .12
3.2 Input Features Extraction . . . . .13
3.3 Music-Pose Embedding Phase . . . . .13
3.4 Dance Sequence Inference Phase . . . . .14
3.5 Objective Functions . . . . .15
4 Experiments . . . . .16
4.1 Experimental Setup . . . . .16
4.1.1 Implementation Details . . . . .16
4.1.2 Evaluation Metrics . . . . .16
4.1.3 Baseline . . . . .18
4.1.4 Evaluation Settings . . . . .20
4.2 Quantitative Results . . . . .20
4.2.1 Ablation Study . . . . .21
4.3 Qualitative Results . . . . .23
5 Conclusion . . . . .28
6 Bibliography . . . . .29
