帳號:guest(18.227.114.4)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):常海陽
作者(外文):Chang, Hai-Yang
論文名稱(中文):以二維稠密卷積網路完成三維點雲的語義分割
論文名稱(外文):Semantic Segmentation for 3D Point Clouds with 2D Dense Convolutional Networks
指導教授(中文):賴尚宏
指導教授(外文):Lai, Shang-Hong
口試委員(中文):許秋婷
邱瀞德
口試委員(外文):Hsu, Chiou-Ting
Chiu, Ching-Te
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:105061469
出版年(民國):107
畢業學年度:107
語文別:英文
論文頁數:37
中文關鍵詞:稠密卷積網路點雲語義分割三維深度神經網路深度學習
外文關鍵詞:Dense Convolutional NetworksPoint CloudsSemantic Segmentation3Ddeep neural networksdeep learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:160
  • 評分評分:*****
  • 下載下載:33
  • 收藏收藏:0
近年來隨著深度神經網路的發展,已經有了一些深度學習的方法可以處理三
維點雲語義分割的問題。然而,現有的方法要麼沒有探索場景裡物體的空間佈
局,要麼將點雲轉換為具有盡可能高的解析度的有序結構,從而導致模型大小
隨輸入場景的尺寸的增加而增長。
在這篇論文中,我們提出一個克服了上述缺點的新的深度模型,該模型先
將原始點雲轉變為具有固定數量的單元的規則網格,而不受限於其實際尺寸,
再採用擴展的二維密集卷積網路來學習點雲場景裡的物體的多尺度的空間佈局
資訊。此外,我們使用了加權損失函數來應對類別數量不平衡的資料集。在實
驗中,我們驗證了論文所提出的模型在公開資料集上具有競爭力的表現,以及
針對輸入的不同尺寸的點雲場景的靈活性和魯棒性。在應用面,我們也展示了我
們的模型是有效率的,它具有比以往更快的推理速度。
With the rapid development of deep neural networks in recent years, a few methods have been proposed to tackle the semantic segmentation problem for 3D point clouds. However, previous approaches either did not explore the spatial layout of objects in the scene, or transformed the point cloud into an ordered structure with the highest feasible resolution resulting in a model size proportional to the size of the input scene.
In this thesis, we overcome the above shortcomings by introducing a new framework that first converts raw point clouds into regular grids with fixed numbers of cells without being limited to its actual size, and then employs an extended 2D Dense Convolutional Networks (DenseNets) to capture the multi-scale spatial layout of objects within a point cloud scene. Moreover, we use a weighted loss function to deal with the data imbalance problem. The experimental results demonstrate competitive performance of the proposed model on public datasets, as well as its flexibility and robustness for different input point cloud scene sizes. In addition, we show the efficiency of our model, which enjoys a higher inference speed.
Contents
CHAPTER 1. INTRODUCTION ································································································· 1
1.1 MOTIVATION··························································································································· 1
1.2 PROBLEM DESCRIPTION ········································································································· 2
1.3 MAIN CONTRIBUTIONS ··········································································································· 3
1.4 THESIS ORGANIZATION ··································································································· 4
CHAPTER 2. RELATED WORKS ······························································································ 5
2.1 3DCNN APPLICATIONS······························································································· 5
2.2 POINTNET APPLICATIONS ························································································ 7
2.3 OTHER 3D SEMANTIC SEGMENTATION METHODS ······················································ 8
2.4 DENSENET APPLICATIONS ··································································································· 9
CHAPTER 3. PROPOSED METHOD ······················································································ 11
3.1 RAW POINT CLOUD FEATURE PROCESSING············································································· 12
3.2 GLOBAL FEATURE EXTRACTION ···························································································· 12
3.3 MODEL TRAINING ··································································································· 17
CHAPTER 4. EXPERIMENTAL EVALUATION ····································································· 19
4.1 DATASETS ··························································································································· 19
4.2 SEGMENTATION RESULTS ······································································································· 20
4.3 DISCUSSION ··································································································· 25
CHAPTER 5. CONCLUSION ··································································································· 32
REFERENCES ····························································································································· 33
[1] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1912–1920, 2015.
[2] Maturana and S. Scherer. Voxnet: A 3d convolutional neural network for real-time object recognition. In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, pages 922–928. IEEE, 2015.
[3] C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. Guibas. Volumetric and multi-view cnns for object classification on 3d data. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2016.
[4] Y. Zhang, M. Bai, P. Kohli, S. Izadi, and J. Xiao. Deepcontext: Context-encoding neural pathways for 3d holistic scene understanding. arXiv preprint arXiv:1603.04922, 2016.
[5] Y. Li, S. Pirk, H. Su, C. R. Qi, and L. J. Guibas. FPNN: field probing neural networks for 3d data. arXiv.org, 1605.06240, 2016.
[6] V. Hegde and R. Zadeh. Fusionnet: 3d object classification using multiple data representations. CoRR, abs/1607.05695, 2016.
[7] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in Neural Information Processing Systems, 2016.
[8] S. Song and J. Xiao. Deep sliding shapes for amodal 3D object detection in RGB-D images. arXiv preprint arXiv:1511.02300, 2015.
[9] N. Sedaghat, M. Zolfaghari, E. Amiri, and T. Brox. Orientation-boosted voxel nets for 3D object recognition. In Proc. BMVC, 2017.
[10] S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser. Semantic scene completion from a single depth image. arXiv preprint arXiv:1611.08974, 2016.
[11] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese. 3dr2n2: A unified approach for single and multi-view 3d object reconstruction. In Proceedings of the European Conference on Computer Vision (ECCV), 2016.
[12] A. Dai, C. R. Qi, and M. Nießner. Shape completion using 3d-encoder-predictor cnns and shape synthesis. arXiv preprint arXiv:1612.00101, 2016.
[13] J. Wu, T. Xue, J. J. Lim, Y. Tian, J. B. Tenenbaum, A. Torralba, and W. T. Freeman. Single image 3d interpreter network. In European Conference on Computer Vision (ECCV), 2016.
[14] Christian X. Yan, J. Yang, E. Yumer, Y. Guo, and H. Lee. Perspective transformer nets: Learning single-view 3d object reconstruction without 3d supervision. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 1696–1704. Curran Associates, Inc., 2016.
[15] J. Huang and S. You. Point cloud labeling using 3D Convolutional Neural Network. In ICPR, 2016.
[16] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. arXiv preprint arXiv:1702.04405, 2017.
[17] L. P. Tchapmi, C. B. Choy, I. Armeni, J. Gwak, and S. Savarese. Segcloud: Semantic segmentation of 3d point clouds. arXiv preprint arXiv:1710.07563, 2017.
[18] D. Z. Wang and I. Posner. Voting for voting in online point cloud object detection. Proceedings of the Robotics: Science and Systems, Rome, Italy, 1317, 2015.
[19] G. Riegler, A. O. Ulusoy, H. Bischof, and A. Geiger. OctNetFusion: Learning depth fusion from data. In Proc. 3DV, 2017.
[20] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. arXiv preprint arXiv:1612.00593, 2016.
[21] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deephierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413, 2017.
[22] F. Engelmann, T. Kontogianni, A. Hermans, and B. Leibe. Exploring spatial context for 3d semantic segmentation of point clouds. In Proc. CVPR, 2017.
[23] P. Guerrero, Y. Kleiman, M. Ovsjanikov, and N. J. Mitra. PCPNet: Learning local shape properties from raw point clouds. Computer Graphics Forum (Eurographics), 2018.
[24] Y. Zhou and O. Tuzel, Voxelnet: End-to-end learning for point cloud based 3d object detection. arXiv preprint arXiv:1711.06396, 2017.
[25] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas. Frustum PointNets for 3D object detection from RGB-D data. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018.
[26] W. Wang, R. Yu, Q. Huang, and U. Neumann. SGPN: Similarity group proposal network for 3D point cloud instance segmentation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018.
[27] L. Landrieu and M. Simonovsky. Large-scale point cloud semantic segmentation with superpoint graphs. arXiv preprint arXiv:1711.09869, Nov. 2017.
[28] W. Zeng and T. Gevers, “3dcontextnet: Kd tree guided hierarchical learning of point clouds using local contextual cues,” arXiv preprint arXiv:1711.11379, 2017.
[29] B.-S. Hua, M.-K. Tran, and S.-K. Yeung. Pointwise convolutional neural networks. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018.
[30] Liu, F., Li, S., Zhang, L., Zhou, C., Ye, R., Wang, Y., Lu, J.: 3DCNN-DQN-RNN: a deep reinforcement learning framework for semantic parsing of large-scale 3D point clouds. In: IEEE ICCV. (2017) 5678–5687
[31] Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829 (2018)
[32] Huang, Q., Wang, W., Neumann, U.: Recurrent slice networks for 3d segmentation on point clouds. CVPR (2018)
[33] Huang, Z. Liu, and K. Q. Weinberger. Densely connected convolutional networks. CoRR, abs/1608.06993, 2016
[34] I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2016.
[35] J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01, pages 282–289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.
[36] S. Jegou, M. Drozdzal, D. Vazquez, et al. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. arXiv preprint arXiv:1611.09326, 2016.
[37] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in PyTorch. In NIPS-W, 2017.
[38] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[39] Occipital. Occipital: The structure sensor, 2016.
[40] S. Ramakrishnan, S. Pachori, A. Gangopadhyay, and S. Raman. Deep generative filter for motion deblurring. 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pages 2993–3000, 2017.
[41] T. Tong, G. Li, X. Liu, and Q. Gao. Image superresolution using dense skip connections. In ICCV, 2017.
[42] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu. Residual Dense Network for Image Super-Resolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[43] H. Zhang and V. M. Patel. Densely connected pyramid dehazing network. In CVPR, 2018.
[44] P. Bilinski and V. Prisacariu. Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation. In CVPR, 2018.
[45] Z. Shen, Z. Liu, J. Li, Y. Jiang, Y. Chen, and X. Xue. DSOD: learning deeply supervised object detectors from scratch. In ICCV, 2017.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *