帳號:guest(216.73.216.146)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):謝沛錫
作者(外文):Xie, Pei-Xi
論文名稱(中文):MVRNet:基於單視角的著衣人體重建之多模態體積表示網絡
論文名稱(外文):MVRNet: Multimodal Volumetric Representation Network for Monocular Clothed Human Reconstruction
指導教授(中文):李祈均
指導教授(外文):Lee, Chi-Chun
口試委員(中文):胡敏君
林奕成
黃敬群
口試委員(外文):Hu, Min-Chun
Lin, I-Chen
Huang, Ching-Chun
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:110061636
出版年(民國):112
畢業學年度:112
語文別:英文
論文頁數:33
中文關鍵詞:單視角著衣人體重建三維電腦視覺多模態視覺參數化身體模型
外文關鍵詞:Monocular Clothed Human Reconstruction3D Computer VisionMulti-modal VisionParametric body models
相關次數:
  • 推薦推薦:0
  • 點閱點閱:134
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
從單一影像重建三維人體模型是一項對於人機交互影響的重要挑戰性任務,特別是在虛擬人體可動化身領域。目前最先進的方法通常依賴多層級聯模型以實現更高分辨率的二維表示,或者利用多人線性蒙皮(Skinned Multi-Person Linear, SMPL)模型來推斷三維關係。然而,這些方法可能耗時,而且對SMPL模型的重度依賴可能導致不盡人意的結果,特別是對於寬鬆服裝。為了克服這些限制,我們引入了多模態體積表示網絡(MVRNet),這是一種新型的深度神經網絡,它採用弱線性參數模型條件和多模態特徵融合,以改進從單一的二維圖像中獲得之三維表示。關鍵是我們的模型僅在公開可用的數據集上進行訓練,並在多樣的數據集和現實場景中進行了廣泛的評估,顯示出與最先進方法相比,它在準確性、穩健性和泛化能力方面具有優越的性能。我們的研究有助於發展更穩健的三維著衣人體重建,特別是在具有挑戰性的姿勢中,具有在AR/VR、動畫與電影製作以及娛樂行業等領域的潛在應用。
Reconstructing 3D human models from single images is a challenging task with significant implications for human-computer interaction, particularly in the realm of virtual avatars. State-of-the-art methods often rely on cascading models to achieve higher-resolution 2D representations or leverage the SMPL model to infer 3D relationships. However, these approaches can be time-consuming, and the heavy reliance on the SMPL model can lead to suboptimal results, especially for clothing with loose fits. To overcome these limitations, we introduce Multimodal Volumetric Representation Network (MVRNet), a novel deep neural network that employs weak parametric model-conditioned and multimodal feature fusion to improve 3D representation from single 2D images. Crucially, our model is trained solely on publicly available datasets, and we conduct extensive evaluations on diverse datasets and in-the-wild scenarios, demonstrating its superior performance in terms of accuracy, robustness, and generalization ability compared to state-of-the-art methods. Our research contributes to the advancement of robust 3D clothed human reconstruction, particularly in challenging poses, with potential applications in AR/VR, animation and film production, and entertainment industries.
誌謝 i
摘要 iii
Abstract v
1 Introduction 1
2 Related Works 5
2.1 Single-view Human Reconstruction 5
2.2 Feature Encoder of Implicit Function 6
3 Surface Representation 7
3.1 Implicit Function 7
3.2 Parametric body model 8
4 Method 9
4.1 Clothed Normal Estimation 9
4.2 MVRNet 10
5 Experiments 15
5.1 Datasets 15
5.2 Network Architecture 16
5.3 Training Details 18
5.4 Evaluations 18
6 Discussion 21
6.1 Ablation Study 21
6.2 Inference Time Consumption 22
6.3 Limitation 22
7 Conclusion 25
A Supplementary 27
A.1 Implicit Function 27
A.2 Training details 27
A.3 SAIL-VOS 3D 28
References 31
[1] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “SMPL: A skinned multi-person linear model,” in Seminal Graphics Papers: Pushing the Boundaries, Vol- ume 2, pp. 851–866, 2015.

[2] S. Saito, Z. Huang, R. Natsume, S. Morishima, A. Kanazawa, and H. Li, “Pifu: Pixel- aligned implicit function for high-resolution clothed human digitization,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 2304–2314, 2019.

[3] Z. Zheng, T. Yu, Y. Wei, Q. Dai, and Y. Liu, “Deephuman: 3d human reconstruction from a single image,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7739–7749, 2019.

[4] X. Zhao, Y.-T. Hu, Z. Ren, and A. G. Schwing, “Occupancy planes for single-view RGB-d human reconstruction,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 3633–3641, 2023. Issue: 3.

[5] Z. Zheng, T. Yu, Y. Liu, and Q. Dai, “Pamir: Parametric model-conditioned implicit repre- sentation for image-based human reconstruction,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 6, pp. 3170–3184, 2021. Publisher: IEEE.

[6] Y. Xiu, J. Yang, D. Tzionas, and M. J. Black, “Icon: Implicit clothed humans obtained from normals,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13286–13296, IEEE, 2022.

[7] A. S. Jackson, C. Manafas, and G. Tzimiropoulos, “3d human body reconstruction from a single image via volumetric regression,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0, 2018.

[8] X. Ma, J. Su, C. Wang, W. Zhu, and Y. Wang, “3D Human Mesh Estimation from Virtual Markers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 534–543, 2023.

[9] G. Pavlakos, V. Choutas, N. Ghorbani, T. Bolkart, A. A. Osman, D. Tzionas, and M. J. Black, “Expressive body capture: 3d hands, face, and body from a single image,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
pp. 10975–10985, 2019.

[10] T. Yu, Z. Zheng, K. Guo, P. Liu, Q. Dai, and Y. Liu, “Function4d: Real-time human volu- metric capture from very sparse consumer rgbd sensors,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5746–5756, 2021.

[11] Q. Ma, J. Yang, A. Ranjan, S. Pujades, G. Pons-Moll, S. Tang, and M. J. Black, “Learning to dress 3d people in generative clothing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6469–6478, 2020.

[12] Z. Su, T. Yu, Y. Wang, and Y. Liu, “Deepcloth: Neural garment representation for shape and style editing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 1581–1593, 2022. Publisher: IEEE.

[13] S. Saito, T. Simon, J. Saragih, and H. Joo, “Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization,” in Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pp. 84–93, 2020.

[14] M. Pesavento, M. Volino, and A. Hilton, “Super-resolution 3d human shape from a sin- gle low-resolution image,” in European Conference on Computer Vision, pp. 447–464, Springer, 2022.

[15] A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estima- tion,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pp. 483–499, Springer, 2016.

[16] K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang, W. Liu, and J. Wang, “High-resolution representations for labeling pixels and regions,” arXiv preprint arXiv:1904.04514, 2019.

[17] Y. Feng, V. Choutas, T. Bolkart, D. Tzionas, and M. J. Black, “Collaborative regression of expressive bodies using moderation,” in 2021 International Conference on 3D Vision (3DV), pp. 792–804, IEEE, 2021.

[18] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy net- works: Learning 3d reconstruction in function space,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4460–4470, 2019.

[19] R. Li, Y. Xiu, S. Saito, Z. Huang, K. Olszewski, and H. Li, “Monocular real-time volumet- ric performance capture,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pp. 49–67, Springer,
2020.

[20] W. E. Lorensen and H. E. Cline, “Marching cubes: A high resolution 3D surface construc- tion algorithm,” in Seminal graphics: pioneering efforts that shaped the field, pp. 347– 353, 1998.

[21] Y. Wu and K. He, “Group normalization,” in Proceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018.

[22] L. N. Smith and N. Topin, “Super-convergence: Very fast training of neural networks us- ing large learning rates,” in Artificial intelligence and machine learning for multi-domain operations applications, vol. 11006, pp. 369–386, SPIE, 2019.

[23] C. Sminchisescu and B. Triggs, “Building roadmaps of local minima of visual models,” in Computer Vision—ECCV 2002: 7th European Conference on Computer Vision Copen- hagen, Denmark, May 28–31, 2002 Proceedings, Part I 7, pp. 566–582, Springer, 2002.

[24] Y.-T. Hu, J. Wang, R. A. Yeh, and A. G. Schwing, “Sail-vos 3d: A synthetic dataset and baselines for object detection and 3d mesh reconstruction from video data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1418– 1428, 2021.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top

相關論文

1. 透過語音特徵建構基於堆疊稀疏自編碼器演算法之婚姻治療中夫妻互動行為量表自動化評分系統
2. 基於健保資料預測中風之研究並以Hadoop作為一種快速擷取特徵工具
3. 一個利用人類Thin-Slice情緒感知特性所建構而成之全時情緒辨識模型新框架
4. 應用多任務與多模態融合技術於候用校長演講自動評分系統之建構
5. 基於多模態主動式學習法進行樣本與標記之間的關係分析於候用校長評鑑之自動化評分系統建置
6. 透過結合fMRI大腦血氧濃度相依訊號以改善語音情緒辨識系統
7. 結合fMRI之迴旋積類神經網路多層次特徵 用以改善語音情緒辨識系統
8. 針對實體化交談介面開發基於行為衡量方法於自閉症小孩之評估系統
9. 一個多模態連續情緒辨識系統與其應用於全域情感辨識之研究
10. 整合文本多層次表達與嵌入演講屬性之表徵學習於強健候用校長演講自動化評分系統
11. 利用聯合因素分析研究大腦磁振神經影像之時間效應以改善情緒辨識系統
12. 利用LSTM演算法基於自閉症診斷觀察量表訪談建置辨識自閉症小孩之評估系統
13. 利用多模態模型混合CNN和LSTM影音特徵以自動化偵測急診病患疼痛程度
14. 以雙向長短期記憶網路架構混和多時間粒度文字模態改善婚 姻治療自動化行為評分系統
15. 透過表演逐字稿之互動特徵以改善中文戲劇表演資料庫情緒辨識系統
 
* *