帳號:guest(18.116.37.31)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):游秉中
作者(外文):Yu, Ping-Chung
論文名稱(中文):基於 2D 模型的知識轉換減少對 3D 標註需求之訓練策略
論文名稱(外文):Data Efficient 3D Learner via Knowledge Transferred from 2D Model
指導教授(中文):孫民
指導教授(外文):Sun, Min
口試委員(中文):林彥宇
邱維辰
口試委員(外文):Lin, Yen-Yu
Chiu, Wei-Chen
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:109061615
出版年(民國):111
畢業學年度:110
語文別:英文
論文頁數:28
中文關鍵詞:語意分割小數據學習場景分析深度學習
外文關鍵詞:segmentationlow-shot learningscene analysis and understandingdeep learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:186
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
當我們需要蒐集一份3D 點雲資料,校準和標註這些資料是非常耗時且昂貴的,因此3D資料的資源數量遠遠不及於2D影像。在這份研究中, 我們利用現有的2D訓練模型從RGB-D影像中萃取了可用的資訊,挑戰3D領域在資料稀少的情境中訓練一個深度學習模型。我們應用了前人訓練完成而且效能突出的影像語意分割2D模型的預測結果,作為我們標註RGB-D影像的偽標籤,緊接著這些標註資料可以為我們預先訓練3D模型,最後我們可以僅用一部份的3D訓練標籤為這個已經預先訓練的模型做微調訓練。如此簡易的訓練策略,我們的結果就能夠從其他針對資料稀少的情境調整了模型架構的這些方法中脫穎而出。另外我們也證明半監督式學習也能夠在預先訓練模型的階段中,從RGB-D影像遷移可用的知識並且更加改善模型預測的準確率。我們實作兩個著名的3D模型和三種 3D 訓練任務以驗證這個訓練策略的效能,在官方(ScanNet)提供的評量設定中,我們在data-efficient的排行榜中成為結果最好的方法。
Collecting and labeling the registered 3D point cloud is costly. As a result, 3D resources for training are typically limited in quantity compared to the 2D images counterpart. In this work, we deal with the data scarcity challenge of 3D tasks by transferring knowledge from strong 2D models via RGB-D images. Specifically, we utilize a strong and well-trained semantic segmentation model for 2D images to augment RGB-D images with pseudo-label. The augmented dataset can then be used to pre-train 3D models. Finally, by simply fine-tuning on a few labeled 3D instances, our method already outperforms existing state-of-the-art that is tailored for 3D label efficiency. We also show that the results of mean-teacher and entropy minimization can be improved by our pre-training, suggesting that the transferred knowledge is helpful in semi-supervised setting. We verify the effectiveness of our approach on two popular 3D models and three different tasks. On ScanNet official evaluation, we establish new state-of-the-art semantic segmentation results on the data-efficient track.
誌謝
摘要 i Abstract ii
1 Introduction 1
2 Related work 5
2.1 Deep3Dmodelsforpointcloudunderstanding. . . . . . . . . . . . . . . . . . 5
2.2 Data-efficient3D.................................. 6
2.3 Knowledgetransferredfrom2D.......................... 6
3 Approach 7
3.1 3Dmodel..................................... 8
3.1.1 Classification................................ 8
3.1.2 Point-levelsemanticsegmentation. ................... 8
3.2 Knowledgetransferredfrom2D ......................... 9
3.2.1 Liftingperspectiveimages......................... 9
3.2.2 Liftingpanoramicimages. ........................ 10
3.2.3 Learning from 2D scene parser via soft pseudo-label. . . . . . . . . . . 10
3.2.4 Downstreamtasks. ............................ 11
3.3 Semi-Supervised ................................. 11
4 Experiments 13
4.1 Implementdetails................................. 13
4.1.1 Imagesceneparser............................. 14
4.1.2 3Dmodels................................. 14
4.1.3 Dataaugmentation............................. 15
4.2 Dataefficientscenesemanticsegmentation ................... 15
4.2.1 Limitedannotations. ........................... 15
4.2.2 Limitedreconstruction........................... 15
4.3 Pre-trainingondifferent3Dmodels ....................... 17
4.4 Shapeanalysisunderlimiteddatascenario.................... 17
4.4.1 Objectclassification ........................... 18
4.4.2 Shapepartsegmentation ......................... 19
4.5 Ablationstudy................................... 19
4.5.1 Imagemodality.............................. 19
4.5.2 Pre-trainingby2Dannotationsandpseudolabels . . . . . . . . . . . . 20
4.5.3 Combining pre-training with semi-supervised learning . . . . . . . . . 21

5 Conclusion 23
References 25
[1] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” in CVPR, pp. 1912–1920, 2015. iv, 5, 18, 19
[2] L. Yi, V. G. Kim, D. Ceylan, I. Shen, M. Yan, H. Su, C. Lu, Q. Huang, A. Sheffer, and L. J. Guibas, “A scalable active framework for region annotation in 3d shape collections,” ACM Trans. Graph., pp. 210:1–210:12, 2016. iv, 18, 19
[3] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in CVPR, pp. 77–85, 2017. 1, 5
[4] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” in NeurIPS, pp. 5099–5108, 2017. 1, 5
[5] W. Wu, Z. Qi, and F. Li, “Pointconv: Deep convolutional networks on 3d point clouds,” in CVPR, pp. 9621–9630, 2019. 1, 5
[6] X. Yan, C. Zheng, Z. Li, S. Wang, and S. Cui, “Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling,” in CVPR, pp. 5588–5597, 2020. 1, 5
[7] H.Thomas,C.R.Qi,J.Deschaud,B.Marcotegui,F.Goulette,andL.J.Guibas,“Kpconv: Flexible and deformable convolution for point clouds,” in ICCV, pp. 6410–6419, 2019. 1, 5
[8] C. B. Choy, J. Gwak, and S. Savarese, “4d spatio-temporal convnets: Minkowski convo- lutional neural networks,” in CVPR, pp. 3075–3084, 2019. 1, 5
[9] B. Graham, M. Engelcke, and L. van der Maaten, “3d semantic segmentation with sub- manifold sparse convolutional networks,” in CVPR, pp. 9224–9232, 2018. 1, 3, 5, 14, 17
[10] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, pp. 248–255, 2009. 1
[11] T.Chen,S.Kornblith,M.Norouzi,andG.E.Hinton,“Asimpleframeworkforcontrastive learning of visual representations,” in ICML, pp. 1597–1607, 2020. 1
[12] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assignments,” in NeurIPS, 2020. 1
[13] K. He, H. Fan, Y. Wu, S. Xie, and R. B. Girshick, “Momentum contrast for unsupervised visual representation learning,” in CVPR, pp. 9726–9735, 2020. 1
25
[14] X. Chen and K. He, “Exploring simple siamese representation learning,” in CVPR, pp. 15750–15758, 2021. 1
[15] J.Grill,F.Strub,F.Altché,C.Tallec,P.H.Richemond,E.Buchatskaya,C.Doersch,B.Á. Pires, Z. Guo, M. G. Azar, B. Piot, K. Kavukcuoglu, R. Munos, and M. Valko, “Bootstrap your own latent - A new approach to self-supervised learning,” in NeurIPS, 2020. 1
[16] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. B. Girshick, “Masked autoencoders are scalable vision learners,” arXiv preprint arXiv:2111.06377, 2021. 1
[17] S. Xie, J. Gu, D. Guo, C. R. Qi, L. J. Guibas, and O. Litany, “Pointcontrast: Unsupervised pre-training for 3d point cloud understanding,” in ECCV, pp. 574–591, 2020. 1, 3, 6, 16
[18] A. Dai, A. X. Chang, M. Savva, M. Halber, T. A. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in CVPR, pp. 2432–2443, 2017. 1, 3, 15
[19] Z. Liu, X. Qi, and C. Fu, “One thing one click: A self-training approach for weakly super- vised 3d semantic segmentation,” in CVPR, pp. 1726–1736, 2021. 3, 6, 15, 16
[20] J.Hou,B.Graham,M.Nießner,andS.Xie,“Exploringdata-efficient3dsceneunderstand- ing with contrastive scene contexts,” in CVPR, pp. 15587–15597, 2021. 3, 6, 16
[21] P.Wang,Y.Liu,Y.Guo,C.Sun,andX.Tong,“O-CNN:octree-basedconvolutionalneural networks for 3d shape analysis,” ACM Trans. Graph., pp. 72:1–72:11, 2017. 3, 5, 14, 17, 19
[22] Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “Pointcnn: Convolution on x- transformed points,” in NeurIPS (S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, eds.), pp. 828–838, 2018. 5
[23] A. Boulch, “Convpoint: Continuous convolutions for point cloud processing,” Comput. Graph., pp. 24–34, 2020. 5
[24] D. Maturana and S. A. Scherer, “Voxnet: A 3d convolutional neural network for real-time object recognition,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2015, Hamburg, Germany, September 28 - October 2, 2015, pp. 922–928, 2015. 5
[25] B. Graham, “Sparse 3d convolutional neural networks,” in BMVC (X. Xie, M. W. Jones, and G. K. L. Tam, eds.), pp. 150.1–150.9, 2015. 5
[26] G. Riegler, A. O. Ulusoy, and A. Geiger, “Octnet: Learning deep 3d representations at high resolutions,” in CVPR, pp. 6620–6629, 2017. 5
[27] L. Luo, B. Tian, H. Zhao, and G. Zhou, “Pointly-supervised 3d scene parsing with view- point bottleneck,” arXiv preprint arXiv:2109.08553, 2021. 6, 16
[28] Z. Zhang, R. Girdhar, A. Joulin, and I. Misra, “Self-supervised pretraining of 3d features on any point-cloud,” in ICCV, 2021. 6
[29] Y.-C. Liu, Y.-K. Huang, H.-Y. Chiang, H.-T. Su, Z.-Y. Liu, C.-T. Chen, C.-Y. Tseng, and W. H. Hsu, “Learning from 2d: Contrastive pixel-to-point knowledge transfer for 3d pre- training,” arXiv preprint arXiv:2104.04687, 2021. 6
26

[30] B.Zhou,H.Zhao,X.Puig,S.Fidler,A.Barriuso,andA.Torralba,“Sceneparsingthrough ADE20K dataset,” in CVPR, pp. 5122–5130, 2017. 10, 14
[31] H. Caesar, J. R. R. Uijlings, and V. Ferrari, “Coco-stuff: Thing and stuff classes in con- text,” in CVPR, pp. 1209–1218, 2018. 10
[32] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in CVPR, pp. 6230–6239, 2017. 10
[33] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 834–848, 2018. 10
[34] Z. Zhu, M. Xu, S. Bai, T. Huang, and X. Bai, “Asymmetric non-local neural networks for semantic segmentation,” in ICCV, pp. 593–602, 2019. 10
[35] Y.Yuan,X.Chen,andJ.Wang,“Object-contextualrepresentationsforsemanticsegmenta- tion,” in ECCV (A. Vedaldi, H. Bischof, T. Brox, and J. Frahm, eds.), pp. 173–190, 2020. 10
[36] C. Hsiao, C. Sun, H. Chen, and M. Sun, “Specialize and fuse: Pyramidal output represen- tation for semantic segmentation,” in ICCV, 2021. 10
[37] R. Ranftl, A. Bochkovskiy, and V. Koltun, “Vision transformers for dense prediction,” in ICCV, pp. 12179–12188, 2021. 10, 14
[38] G. E. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015. 10
[39] Y. Grandvalet and Y. Bengio, “Semi-supervised learning by entropy minimization,” in NeurIPS, pp. 281–296, 2005. 11
[40] A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in ICLR, 2017. 11
[41] S. Song, S. P. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D scene understanding benchmark suite,” in CVPR, pp. 567–576, 2015. 13, 20
[42] A. X. Chang, A. Dai, T. A. Funkhouser, M. Halber, M. Nießner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matterport3d: Learning from RGB-D data in indoor environ- ments,” in 3DV, pp. 667–676, 2017. 13
[43] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd Interna- tional Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (Y. Bengio and Y. LeCun, eds.), 2015. 14
[44] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, pp. 234–241, 2015. 17
[45] Y.Zhao,T.Birdal,H.Deng,andF.Tombari,“3dpointcapsulenetworks,”inIEEEConfer- ence on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 1009–1018, Computer Vision Foundation / IEEE, 2019. 19
27

[46] K. Hassani and M. Haley, “Unsupervised multi-task feature learning on point clouds,” in 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 8159–8170, IEEE, 2019. 19
[47] P.Wang,Y.Yang,Q.Zou,Z.Wu,Y.Liu,andX.Tong,“Unsupervised3dlearningforshape analysis via multiresolution instance discrimination,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp. 2773–2781, AAAI Press, 2021. 19
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *