帳號:guest(3.133.155.163)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):李冠億
作者(外文):Lee, Kuan-Yi
論文名稱(中文):非監督式對比學習卷積神經網路用於多類別共同物體分割
論文名稱(外文):Unsupervised Multi-class Object Co-segmentation with Contrastively Learned CNNs
指導教授(中文):林嘉文
林彥宇
指導教授(外文):Lin, Chia-Wen
Lin, Yen-Yu
口試委員(中文):莊永裕
彭彥璁
口試委員(外文):Chuang, Yung-Yu
Peng, Yan-Tsung
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:108061520
出版年(民國):110
畢業學年度:109
語文別:英文
論文頁數:47
中文關鍵詞:多類別共同物體分割卷積神經網路對比學習
外文關鍵詞:Object co-segmentationConvolutional neural networksContrastive learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:339
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
近年來隨著深度學習的發展,圖像分割任務 (segmentation task) 在監督式學習使用完整標註資訊的方式下,實驗結果表現已趨於成熟,但是同時地訓練時所需的圖像標註也成為其缺點,因為需要大量的人力資源去標註。因此在標記成本昂貴的情況下,不使用任何形式的標籤資料進行深度學習的研究方向就油然而生,然而因為模型無法得知圖像物體的類別,目前非監督式的切割任務解決方法都是採用人工定義的特徵去處理問題,但是圖像中存在的同類別物體間的高度差異性、物體尺度變化、物體重疊、複雜背景…等,這些是人工定義的特徵無法有效代表的,使得任務充滿挑戰性。在這篇論文中,我們藉由提出一個基於多類別共同物體分割 (multi-class object co-segmentation) 的非監督式 CNN 網路演算法來解決具挑戰性的圖像切割問題。我們的CNN網路架構由三個部分組成,一個用來產生各類共同關注熱圖的生成器 (generator) ,一個精煉特徵並增進共同關注熱圖的準確度的注意力模組 (attention module),以及一個把遮罩影像投影到特徵空間的特徵抽取器 (feature extractor)。最後整個網路透過我們設計的共同注意力損失函數 (co-attention loss) 來優化,他將同類別特徵間的距離最小化,同時最大化不同類別特徵之間的距離。與之前的傳統方法相比,我們以深度學習為基礎的演算法,因為它可以自動根據資料有效的學習到高層次的語意特徵,可以良好的解決上述圖像中各式複雜的狀況,實驗結果也顯示我們方法優於過去傳統的方法。
Unsupervised multi-class object co-segmentation aims to co-segment objects of multiple common categories present in a set of images in an unsupervised manner. It is challenging since there are no annotated training data of the object categories. Furthermore, there exist high intra-class variations and inter-class similarities among different object categories and the background. Our work addresses this complex problem by proposing the first unsupervised, end-to-end trainable CNN-based framework consisting of three collaborative modules: the attention module, the co-attention map generator, and the feature extractor. The attention module employs two types of attention: self-attention and mutual attention. While the former aims to capture long-range object dependency, the latter focuses on exploring co-occurrence patterns across images. With the attention module, the co-attention map generator learns to capture the objects of the same categories. Afterward, the feature extractor estimates the discriminative cues to separate the common-class objects and backgrounds. Finally, we optimize the network in an unsupervised fashion via the proposed co-attention loss, which pays attention to reducing the intra-class object discrepancy while enlarging the inter-class margins of the extracted features. Experimental results show that the proposed approach performs favorably against the existing algorithms.
摘 要 ii
Abstract iii
Content iv
Chapter 1 Introduction 6
Chapter 2 Related Work 10
2.1 Object Co-segmentation 10
2.2 Supervised Multi-class Co-segmentation 11
2.3 Unsupervised Multi-class Co-segmentation 12
2.4 Weakly Supervised Semantic Segmentation 13
2.5 Attention Mechanisms 14
Chapter 3 Proposed method 15
3.1 Problem Formulation 15
3.2 Attention Module 17
3.3 Co-attention Loss 21
3.4 Training and Implementation Details 22
Chapter 4 Experiments and Discussion 24
4.1 Datasets and Evaluation Metrics 24
4.2 Comparisons with State-of-the-Arts 25
4.3 Effectiveness of the Attention Module 30
4.4 Run-time Complexity 36
4.5 Performance of Post-processing 37
4.6 Failure Cases 39
Chapter 5 Conclusion 41
References 42
[1] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2015.
[2] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” TPAMI, 2017.
[3] A. Kanezaki, “Unsupervised image segmentation by backpropagation,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., 2018.
[4] A. Bielski and P. Favaro, “Emergence of object segmentation in perturbed generative models,” arXiv preprint arXiv:1905.12663, 2019.
[5] C. Rother, T. Minka, A. Blake, and V. Kolmogorov, “Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2006.
[6] G. Kim and E. P. Xing, “On multiple foreground cosegmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2012.
[7] T. Ma and L. Jan Latecki, “Graph transduction learning with connectivity constraints with application to multiple foreground cosegmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2013.
[8] H. Zhu, J. Lu, J. Cai, J. Zheng, and N. M. Thalmann, “Multiple foreground recognition and cosegmentation: An object-oriented crf model with robust higher-order potentials,” in Proc. Winter Conf. Appl. Comput. Vis., 2014.
[9] H.-S. Chang and Y.-C. F. Wang, “Optimizing the decomposition for multiple foreground cosegmentation,” Comput. Vis. Image Understanding, 2015.
[10] F. Wang, Q. Huang, M. Ovsjanikov, and L. J. Guibas, “Unsupervised multi-class joint image segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2014.
[11] W. Yang, Z. Sun, B. Li, J. Hu, and K. Yang, “Unsupervised multiple object cosegmentation via ensemble miml learning,” in Proc. Int. Conf. Multimedia Modeling, 2017.
[12] F. Meng, H. Li, S. Zhu, B. Luo, C. Huang, B. Zeng, and M. Gabbouj, “Constrained directed graph clustering and segmentation propagation for multiple foregrounds cosegmentation,” IEEE Trans. Circuits Syst. Video Technol., 2015.
[13] H. Li, F. Meng, Q. Wu, and B. Luo, “Unsupervised multiclass region cosegmentation via ensemble clustering and energy minimization,” IEEE Trans. Circuits Syst. Video Technol., 2013.
[14] K.-J. Hsu, Y.-Y. Lin, and Y.-Y. Chuang, “Co-attention cnns for unsupervised object co-segmentation.” in Proc. Int. Joint Conf. Artificial Intell., 2018.
[15] K. R. Jerripothula, J. Cai, F. Meng, and J. Yuan, “Automatic image cosegmentation using geometric mean saliency,” in Proc. IEEE Int. Conf. Image Process., 2014.
[16] J. Dai, Y. Nian Wu, J. Zhou, and S.-C. Zhu, “Cosegmentation and cosketch by unsupervised learning,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2013.
[17] R. Quan, J. Han, D. Zhang, and F. Nie, “Object co-segmentation via graph optimized-flexible manifold ranking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016.
[18] C.-C. Tsai, W. Li, K.-J. Hsu, X. Qian, and Y.-Y. Lin, “Image co-saliency detection and co-segmentation via progressive joint optimization,” IEEE Trans. Image Process., 2018.
[19] C.-C. Tsai, K.-J. Hsu, Y.-Y. Lin, X. Qian, and Y.-Y. Chuang, “Deep co-saliency detection via stacked autoencoder-enabled fusion and selftrained cnns,” IEEE Trans. Multimedia, 2019.
[20] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., 2004.
[21] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2005.
[22] L. Mukherjee, V. Singh, and J. Peng, “Scale invariant cosegmentation for image groups,” in CVPR, 2011.
[23] J. C. Rubio, J. Serrat, A. Lopez, and N. Paragios, “Unsupervised ´ co-segmentation through region matching,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2012.
[24] S. Vicente, C. Rother, and V. Kolmogorov, “Object cosegmentation,” Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2011.
[25] W. Li, O. H. Jafari, and C. Rother, “Deep object co-segmentation,” in Proc. Asian Conf. Comput. Vis., 2018.
[26] H. Chen, Y. Huang, and H. Nakayama, “Semantic aware attention based deep object co-segmentation,” in Proc. Asian Conf. Comput. Vis., 2018.
[27] Joulin, A and Bach, F and Ponce, J, “Multi-class cosegmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2012.
[28] A. Kolesnikov and C. H. Lampert, “Seed, expand and constrain: Three principles for weakly-supervised image segmentation,” in Proc. European Conf. Comput. Vis. Springer, 2016.
[29] A. Roy and S. Todorovic, “Combining bottom-up, top-down, and smoothness cues for weakly supervised image segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017.
[30] Y. Wei, J. Feng, X. Liang, M.-M. Cheng, Y. Zhao, and S. Yan, “Object region mining with adversarial erasing: A simple classification to semantic segmentation approach,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017.
[31] Y. Wei, H. Xiao, H. Shi, Z. Jie, J. Feng, and T. S. Huang, “Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018.
[32] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. Neural Inf. Process. Syst., 2017.
[33] J. Cheng, L. Dong, and M. Lapata, “Long short-term memory-networks for machine reading,” arXiv preprint arXiv:1601.06733, 2016.
[34] R. Paulus, C. Xiong, and R. Socher, “A deep reinforced model for abstractive summarization,” arXiv preprint arXiv:1705.04304, 2017.
[35] M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” arXiv preprint arXiv:1508.04025, 2015.
[36] G. Lin, C. Shen, A. van den Hengel, and I. Reid, “Efficient piecewise training of deep structured models for semantic segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 3194–3203.
[37] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., June 2019.
[38] J. Tang, R. Hong, S. Yan, T.-S. Chua, G.-J. Qi, and R. Jain, “Image annotation by k nn-sparse graph-based label propagation over noisily tagged web images,” TIST, 2011.
[39] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” in Proc. Int. Conf. Mach. Learn., 2019.
[40] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., June 2018.
[41] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[42] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2009.
[43] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016.
[44] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[45] Z.-H. Yuan, T. Lu, Y. Wu et al., “Deep-dense conditional random fields for object co-segmentation.” in IJCAI, 2017, pp. 3371–3377.
[46] B. Zhang, J. Xiao, Y. Wei, M. Sun, and K. Huang, “Reliability does matter: An end-to-end weakly supervised semantic segmentation approach,” in Proc. Int. Conf. Artif. Intell., 2020.
[47] Y. Oh, B. Kim, and B. Ham, “Background-aware pooling and noiseaware loss for weakly-supervised semantic segmentation,” 2021.
[48] P. Krahenbuhl and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” in Proc. Neural Inf. Process. Syst., 2011.
[49] M. Rubinstein, A. Joulin, J. Kopf, and C. Liu, “Unsupervised joint object discovery and segmentation in internet images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2013.
[50] D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen, “icoseg: Interactive co-segmentation with intelligent scribble guidance,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2010.
[51] J. Winn, A. Criminisi, and T. Minka, “Object categorization by learned universal visual dictionary,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2005.
[52] A. Faktor and M. Irani, “Co-segmentation by composition,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2013.
[53] G. Kim, E. P. Xing, L. Fei-Fei, and T. Kanade, “Distributed cosegmentation via submodular optimization on anisotropic diffusion,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2011.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *