帳號:guest(18.191.218.252)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):張言鴻
作者(外文):ZHANG,YAN-HONG
論文名稱(中文):三維人體骨架預測之領域特徵對齊與自我學習之方法
論文名稱(外文):Domain feature alignment method with self-training for 3D human pose estimation
指導教授(中文):朱宏國
指導教授(外文):Chu, Hung-Kuo
口試委員(中文):胡敏君
姚智遠
口試委員(外文):Hu, Min-Chun
Yao, Chih-Yuan
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:108062557
出版年(民國):110
畢業學年度:109
語文別:中文
論文頁數:24
中文關鍵詞:領域自適應三維人體骨架預測自我學習
外文關鍵詞:Domain Adaptation3D Human Pose EstimationSelf Training
相關次數:
  • 推薦推薦:0
  • 點閱點閱:724
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
本論文目標在於解決收集三維人體骨架標注資料需要耗費大量時間與金錢的問題,在三維人體骨架預測的任務中,常常會需要精確的標注資料來幫助模型進行訓練,然而一般在收集三維人體骨架資料時,往往需要架設多台的高速攝影機與單目相機以及可控制的場域來捕捉精確的人體骨架資料,此外收集到的三維人體骨架還需要後續一連串的後處理才能夠取得可以使用的資料,既耗時又耗力。

為了要解決真實標注資料難以取得的問題,近年來常見的作法是利用電腦圖學的技術來產生大量的合成標注資料對模型進行訓練,但這也衍伸出另一個問題,由於合成資料的畫面與真實資料的畫面存在著一定的差距,因此訓練在合成資料集上的模型在真實資料集上的表現通常不太好。所以會需要特別設計一套領域自適應的方法來縮小合成資料集與真實資料集上的差異。

在此論文中,我們希望能夠設計出針對三維人體骨架預測的非監督式領域自適應方法,在不使用任何真實資料標注資料的情況下,減少合成資料集與真實資料集上的差異,以達到合成資料知識遷移到真實資料的效果。具體來說,我們借鏡自我學習的方法(self-training),針對真實資料集的圖片產生偽標籤,並透過置信度過濾的方式,讓模型可以先從置信度比較高的預測骨架中學習,從而減少模型學習到錯誤樣本的機會,同時我們也設計了特徵對齊的方法來幫助合成資料的特徵和真實資料的特徵能夠有效進行對齊,並且效果優於傳統使用辨別器(discriminator)進行特徵對齊的方法。
最後我們使用公開的合成資料集Surreal以及真實的三維資料集Human3.6m和真實的二維資料集MPII來驗證我們方法的有效性,同時與現在SOTA的方法進行比較以及討論。
This thesis aims to solve the problem of real-world 3D human skeleton annotation data is hard to get. In the 3D human skeleton estimation task, accurate annotation data play an important role to train a high-performance model.
However, collecting 3D human skeleton data is not easy which often necessary to set up multiple high-speed cameras, monocular cameras, and controllable environments to capture accurate human skeleton data, which is time-consuming and labor-intensive.
To solve the above problem, a common practice in recent years is to use computer graphics technology to generate a large amount of synthetic annotation data to train the model, but this also raises another issue. Since the appearance of synthetic data is usually different than real-world data, the model trained on synthetic data often gets poor results when testing on real-world data. Therefore, it is necessary to design a domain adaptation method to reduce the difference between synthetic data and real-world data.
In this paper, we design a feature alignment method to align the synthetic domain feature and real-world domain feature through freezing decoder when calculating the target loss, also we leverage the self-training algorithm to minimize the feature mismatch problem.
We validate the efficacy of our approach on the Human3.6m dataset and show improved mpjpe and pa-mpjpe result over baseline methods and most SOTA methods.
i. 摘要
ii. Abstract
iii. 目錄
1. 前言---------------------------------------1
2. 相關研究-----------------------------------4
2.1 單張圖像三維人體骨架預測--------------4
2.2 非監督式領域自適應--------------------5
2.3 自我學習------------------------------5
3. 方法架構-----------------------------------7
3.1 自我學習方法--------------------------8
3.2 領域特徵對齊方法----------------------8
3.3 偽標籤產生方法------------------------8
3.3.1 風格變換一致性------------------9
3.3.2 剛性變換一致性------------------9
3.3.3 自集成--------------------------10
3.4 偽標籤置信度過濾機制------------------10
3.5 演算法偽代碼--------------------------10
3.6 偽標籤產生結果------------------------11
4.實驗結果------------------------------------12
4.1 測試資料集----------------------------12
4.1.1 Surreal-------------------------12
4.1.2 Human3.6m-----------------------13
4.1.3 MPII----------------------------13
4.2 度量指標------------------------------14
4.3 實驗實作細節--------------------------14
4.3.1 預訓練階段----------------------14
4.3.2 自適應階段----------------------14
4.4 實驗結果------------------------------14
4.4.1 未使用MPII資料集----------------15
4.4.2 使用MPII資料集------------------16
4.4.3 驗證領域特徵對齊方法的有效性----16
4.4.4 置信度選擇機制對實驗之影響------17
4.5 小結----------------------------------18
5.結論與未來工作------------------------------19
6.附錄----------------------------------------20
6.1 使用骨架定義--------------------------20
6.2 更多實驗結果--------------------------20
[1] Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan
Laptev, and Cordelia Schmid. Learning from synthetic humans. In CVPR, 2017.
[2] Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3.6m:
Large scale datasets and predictive methods for 3d human sensing in natural environments.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014.
[3] Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2d human pose
estimation: New benchmark and state of the art analysis. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), June 2014.
[4] Zhengyou Zhang. Microsoft kinect sensor and its effect. IEEE multimedia, 19(2):4–10,
2012.
[5] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle,
François Laviolette, Mario Marchand, and Victor Lempitsky. Domain­adversarial training
of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
[6] Jiteng Mu, Weichao Qiu, Gregory D Hager, and Alan L Yuille. Learning from synthetic
animals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 12386–12395, 2020.
[7] Vassilis Athitsos and Stan Sclaroff. Estimating 3d hand pose from a cluttered image. In
2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
2003. Proceedings., volume 2, pages II–432. IEEE, 2003.
[8] Greg Mori and Jitendra Malik. Estimating human body configurations using shape context
matching. In European conference on computer vision, pages 666–680. Springer, 2002.
[9] Gregory Shakhnarovich, Paul Viola, and Trevor Darrell. Fast pose estimation with
parameter­sensitive hashing. In Computer Vision, IEEE International Conference on, volume 3, pages 750–750. IEEE Computer Society, 2003.
[10] Bjoern Stenger, Arasanathan Thayananthan, Philip HS Torr, and Roberto Cipolla. Filtering
using a tree­based estimator. In ICCV, volume 3, page 1063, 2003.
[11] Christoph Bregler and Jitendra Malik. Tracking people with twists and exponential maps.
In Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (Cat. No. 98CB36231), pages 8–15. IEEE, 1998.
[12] Cristian Sminchisescu and Bill Triggs. Covariance scaled sampling for monocular 3d body
tracking. In Proceedings of the 2001 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition. CVPR 2001, volume 1, pages I–I. IEEE, 2001.
[13] Mude Lin, Liang Lin, Xiaodan Liang, Keze Wang, and Hui Cheng. Recurrent 3d pose
sequence machines. In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 810–819, 2017.
[14] Julieta Martinez, Rayat Hossain, Javier Romero, and James J Little. A simple yet effective
baseline for 3d human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, pages 2640–2649, 2017.
[15] Bugra Tekin, Pablo Márquez­Neila, Mathieu Salzmann, and Pascal Fua. Learning to fuse
2d and 3d image cues for monocular body pose estimation. In Proceedings of the IEEE
International Conference on Computer Vision, pages 3941–3950, 2017.
[16] Aiden Nibali, Zhen He, Stuart Morgan, and Luke Prendergast. 3d human pose estimation
with 2d marginal heatmaps. In 2019 IEEE Winter Conference on Applications of Computer
Vision (WACV), pages 1477–1485. IEEE, 2019.
[17] Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei. Integral human pose
regression. In Proceedings of the European Conference on Computer Vision (ECCV), pages
529–545, 2018.
[18] Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons­Moll, and Michael J.
Black. SMPL: A skinned multi­person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, October 2015.
[19] Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. End­to­end
recovery of human shape and pose. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 7122–7131, 2018.
[20] Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. Learning transferable
features with deep adaptation networks. In International conference on machine learning,
pages 97–105. PMLR, 2015.
[21] Judy Hoffman, Eric Tzeng, Taesung Park, Jun­Yan Zhu, Phillip Isola, Kate Saenko, Alexei
Efros, and Trevor Darrell. Cycada: Cycle­consistent adversarial domain adaptation. In
International conference on machine learning, pages 1989–1998. PMLR, 2018.
[22] Xiheng Zhang, Yongkang Wong, Mohan S Kankanhalli, and Weidong Geng. Unsupervised domain adaptation for 3d human pose estimation. In Proceedings of the 27th ACM
International Conference on Multimedia, pages 926–934, 2019.
[23] Shuangjun Liu, Naveen Sehgal, and Sarah Ostadabbas. Adapted human pose: Monocular
3d human pose estimation with zero real 3d pose data. arXiv preprint arXiv:2105.10837,
2021.
[24] Dong­Hyun Lee et al. Pseudo­label: The simple and efficient semi­supervised learning
method for deep neural networks. In Workshop on challenges in representation learning,
ICML, volume 3, page 896, 2013.
[25] Yang Zou, Zhiding Yu, BVK Kumar, and Jinsong Wang. Unsupervised domain adaptation
for semantic segmentation via class­balanced self­training. In Proceedings of the European
conference on computer vision (ECCV), pages 289–305, 2018.
[26] Yang Zou, Zhiding Yu, Xiaofeng Liu, BVK Kumar, and Jinsong Wang. Confidence regularized self­training. In Proceedings of the IEEE/CVF International Conference on Computer
Vision, pages 5982–5991, 2019.
[27] Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, and Li­Jia Li. Learning from noisy labels with distillation. In Proceedings of the IEEE International Conference
on Computer Vision, pages 1910–1918, 2017.
[28] Yi­Hsuan Tsai, Wei­Chih Hung, Samuel Schulter, Kihyuk Sohn, Ming­Hsuan Yang, and
Manmohan Chandraker. Learning to adapt structured output space for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition,
pages 7472–7481, 2018.
(此全文未開放授權)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *