透過跨層併列與多尺度預測的完全卷積網路之語意分割__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.157) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	史敦槐
作者(外文):	Shih, Tun-Huai
論文名稱(中文):	透過跨層併列與多尺度預測的完全卷積網路之語意分割
論文名稱(外文):	Fully Convolutional Networks with Cross-layer Concatenation and Multi-Scale Prediction for Semantic Segmentation
指導教授(中文):	許秋婷
指導教授(外文):	Hsu, Chiou-Ting
口試委員(中文):	劉庭祿陳煥宗
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	103062584
出版年(民國):	105
畢業學年度:	104
語文別:	英文
論文頁數:	30
中文關鍵詞:	語意分割、深度學習、卷積網絡
外文關鍵詞:	Semantic segmentation、Deep learning、Convolutional neural network
相關次數:	推薦:0 點閱:782 評分: 下載:11 收藏:0

語意圖像分割旨在對於一張圖像中的每一個像素都標記一種語意標籤。近日最先進的方法主要基於卷積神經網絡，儘管這些方法達成了卓越的表現，由於它們採用了日益複雜的卷積神經網絡模型，故它們往往需要更大的訓練資料集，以及在訓練與推論階段更加耗時。相對於近期基於複雜卷積神經網絡的方法，我們簡化了一個現有的卷積神經網絡架構VGG-16，但並未因此犧牲分割表現。首先，我們提出一個用數層卷積與池化層取代原本完全連接層的基本模型，並藉此提取分層式特徵。我們將使用分層式特徵來產生多尺度預測，並匯總所有預測來產生一個密集預測結果。此外，為了共同利用來自低階層與高階層的資訊，我們使用跨層特徵併接來擴展基本模型。實驗結果證明，我們提出的模型在參數量僅有VGG-16的四分之一，且沒有使用任何後處理與條件隨機域修正的情形下，依舊在以下三個熱門的資料集中達到了相當不錯的表現：SIFT Flow、Pascal VOC 2012、Pascal Context。

Semantic image segmentation aims to assign a semantic label to each pixel in an image. Recent state-of-the-art approaches are mainly based on Convolutional Neural Networks. Although these approaches achieve outstanding performance, they adopt very complex CNN models. As the result, they usually require larger training dataset and spend more time on both training and inference stages. In contrast to recent complex CNN-based approaches, we propose to simplify an existing CNN architecture, VGG-16, but do not compromise the segmentation performance. Firstly, we propose a basic model by replacing the original fully-connected layers with several convolutional and pooling layers for extracting hierarchical features. We then use the extracted hierarchical features to generate multi-scale predictions, and aggregate all predictions to derive one dense prediction result. Furthermore, we extend the basic model with cross-layer feature concatenation to jointly exploit the information from lower- and higher-level layers. Experimental results show that with only one-fourth the parameters of the original VGG and no post-processing or Conditional Random Field refinement, the proposed model achieves comparable results on three popular datasets: SIFT Flow, Pascal VOC 2012, and Pascal Context.

中文摘要 I
Abstract II
1. Introduction 1
2. Related Work 4
2.1 Two-Stage CNN approaches 4
2.2 Fully Convolutional Network 5
2.3 Deconvolution Network 7
2.4 Convolutional Network with Conditional Random Field 9
3. Proposed Method 11
3.1 Motivation 11
3.2 FCN with Multi-scale Prediction 13
3.2.1 Hierarchical feature extraction 13
3.2.2 Multi-scale prediction and aggregation 15
3.2.3 Feature response normalization 16
3.3 FCN with Cross-layer Concatenation 17
3.3.1 Deconvolution on predictions 18
3.3.2 Cross-layer concatenation 19
3.4 Learning the network 19
4. Experimental Results. 20
4.1 Experimental settings 20
4.1.1 Implementation details 20
4.1.2 Data augmentation 20
4.1.3 Evaluation criteria 20
4.2 Results on SIFT Flow 21
4.3 Results on Pascal VOC 2012 22
4.4 Results on Pascal Context 26
4.5 Discussion and Limitation 27
5. Conclusions 28
6. References 29

[1] J. Tighe and S. Lazebnik.: Superparsing: scalable nonparametric image parsing with superpixels. In: ECCV. (2010)
[2] P. Arbelaez, B. Hariharan, C. Gu, S. Gupta, and L. Bourdev.: Semantic segmentation using regions and parts. In: CVPR. (2012)
[3] C. Farabet, C. Couprie, L. Najman, Y. LeCun.: Learning hierarchical features for scene labeling. In IEEE TPAMI. (2013)
[4] B. Hariharan, P. Arbel´aez, R. Girshick, and J. Malik.: Simultaneous detection and segmentation. In: ECCV. (2014)
[5] J. Long, E. Shelhamer, and T. Darrell.: Fully convolutional networks for semantic segmentation. In: CVPR. (2015)
[6] H. Noh, S. Hong, and B. Han.: Learning deconvolution network for semantic segmentation. In: ICCV. (2015)
[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton.: Imagenet classification with deep convolutional neural networks. In: NIPS. (2012)
[8] K. Simonyan and A. Zisserman.: Very deep convolutional networks for large-scale image recognition. In: ICLR. (2015)
[9] R. Girshick, J. Donahue, T. Darrell, and J. Malik.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR. (2014)
[10] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR. (2015)
[11] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. Torr.: Conditional random fields as recurrent neural networks. In: ICCV. (2015)
[12] Z. Liu, X. Li, P. Luo, C. C. Loy, and X. Tang.: Semantic image segmentation via deep parsing network. In: ICCV. (2015)
[13] G. Lin, C. Shen, A. Hengel, and I. Reid.: Efficient piecewise training of deep structured models for semantic segmentation. In: CVPR. (2016)
[14] C. A. Sutton and A. McCallum.: Piecewise training for undirected models. In: UAI. (2005)
[15] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei.: ImageNet: A Large-Scale Hierarchical Image Database. In: CVPR. (2009)
[16] S. Ioffe and C. Szegedy.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167. (2015)
[17] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov.: Improving neural networks by preventing coadaptation of feature detectors. arXiv:1207.0580. (2012)
[18] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell.: Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093. (2014)
[19] C. Liu, J. Yuen, and A. Torralba.: Sift flow: Dense correspondence across scenes and its applications. In: IEEE TPAMI. (2011)
[20] B. Hariharan, P. Arbelaez, L. Bourdev, S. Maji, and J. Malik.: Semantic contours from inverse detectors. In: ICCV. (2011)
[21] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll’ar, and C. L. Zitnick.: Microsoft COCO: Common objects in context. In: ECCV. (2014).
[22] R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille.: The role of context for object detection and semantic segmentation in the wild. In: CVPR. (2014)

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文