作者(外文):Hsu, Ting-Jui
論文名稱(外文):Improving One Model for One Scene in HumanNeRF via Depth Guidance
指導教授(外文):Lin, Chia-Wen
口試委員(外文):Hu, Min-Chun
Lin, Yen-Yu
Liu, Yu-Lun
外文關鍵詞:Human NeRFGeneralizable Human NeRFParameteric Human ModelDepth MapPoint Cloud
HumanNeRFaimstoreconstruct 3D human from a monocular video and synthesize novel view that has not been seen through feeding other perspective cameraparameters. However, theirdifficultylies in the problem of one model for one scene, leading to high time costs for retraining and makes it difficult to generalize to multiple scenes. In previous methods, it was necessary to input multi-view human images. Yet, it would fail when input contains insufficient multi-view information. Recent methods have proposed point-level features. Nevertheless, due to the poor performance of the explicitly parametric human body model in the details of the limbs, blurry and erroneous movements are rendered.

To solve the above problems, considering the use of a parameterized human body model with three-dimensional vertices, but inaccurate in fine details at the extremities, I thought of using point clouds as a supplement. Taking into account that point clouds are related to three dimensional information, and depth maps can provide reasonable scene depth information, we aim to start from the estimated depth map and then generate better point clouds from the depth. However, it is not easy to convert from depth to point clouds of the human body. Since it will be limited to a specific viewpoint and result in three-dimensional point clouds with missing and inaccurate information. Therefore, we propose a depth-guided module to use the predicted depth map to predict accurate 3D point clouds to guide poses and improve rendering results. Experiments demonstrate that not only can we achieve better results in synthesizing novel view than the current best methods, but our method only requires training a single model to be used across multiple scenes and thus solves the problem of one model for one scene in HumanNeRF.
Abstract ii
1 Introduction 1
1.1 ResearchBackground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 RelatedWork 7
2.1 Static-sceneGeneralizableNeRFApproach . . . . . . . . . . . . . . . . . . . 7
2.2 Dynamic-sceneGeneralizableNeRFApproach . . . . . . . . . . . . . . . . . 8
2.3 GeneralizableHumanNeRFApproach . . . . . . . . . . . . . . . . . . . . . . 8
3 ProposedMethod 11
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Depth-guidedModule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Pixel-alignedFeature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 WeightFunction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5.1 TrainingStage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5.2 LossFunction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5.3 InferenceStage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 ExperimentsResult 22
4.1 DatabaseandBaseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 ExperimentSettings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.1 ComparisonMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.2 TrainingSetting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.3 InferenceSetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.4 EvaluationMetrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 ExperimentsonQuantitativeQuality . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 ExperimentsonVisualQuality . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5 AblationExperiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5 Conclusion 45
References 46
