帳號:guest(13.58.205.159)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):張偲宇
作者(外文):Chang, Ssu-Yu
論文名稱(中文):應用於場景分類之語意感知的關注模組
論文名稱(外文):SAAM: Semantic-Aware Attention Module for Scene Classification
指導教授(中文):許秋婷
指導教授(外文):Hsu, Chiou-Ting
口試委員(中文):彭文孝
王聖智
口試委員(外文):Peng, Wen-Hsiao
Wang, Sheng-Jyh
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:106062555
出版年(民國):108
畢業學年度:107
語文別:英文
論文頁數:32
中文關鍵詞:場景分類關注模組共同關注
外文關鍵詞:Scene ClassificationAttention ModuleCo-attention
相關次數:
  • 推薦推薦:0
  • 點閱點閱:488
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
預測畫面中的場景類別在電腦視覺領域是一個很重要的研究,對自動理解場景內容與分析相關的應用領域有很大的幫助。在這篇論文裡,為了達到精確且不受限於特定場景的內容辨識與標註,我們會著重在結合影像語意分析方法的優勢,希望透過像素級景物的先備知識,將資訊轉移到場景分類模型上。此外,為了處理不同場景類別之間的相似性與同種場景類別內的多樣性,我們透過關注機制的優勢,專注在場景中具有鑑別力的特徵上。我們提出了一個語意感知的關注模組,這個模組包含了一個專注在特徵維度上的分支和一個專注在空間維度上的分支。透過這個關注模組,我們可以有效關注在畫面中關鍵的景物與區域,並與場景分類模型的特徵結合,進而達到幫助預測的目的。其中在特徵維度上的分支,是為了得到整體的語意類別與特徵的關係,專注在畫面中各語意類別所佔的比例;而在空間維度上的分支則是針對於局部性的語意類別與特徵的關係,專注在同種語意類別的特徵在不同位置具有相似的關係。此外,為了使關注模組能更針對同種類別的場景有相似的關注區域,我們提出運用共同關注的機制,進一步關注在同種類別具有共同一致性的區域。在 Broden+ 與 Places365 的實驗結果顯示,我們提出的方法不僅加入的關注模組參數量很少 (平均 1.3 MB),並且在只額外使用語意分析的預測資訊的條件下得到在 Broden+ 與 Places365 資料集上最好或具可比較性的結果。
Predicting scene classes is essential to many applications in automatic scene understanding. In this thesis, in order to precisely classify scenes, we take advantage of semantic segmentation method to transfer pixel-level semantic priors into scene classification model. In addition, in order to deal with inter-class similarity and intra-class diversity, we exploit the attention mechanism to focus on class-discriminative features. We propose a Semantic-Aware Attention Module (SAAM), which includes a channel attention branch and a spatial attention branch, to integrate both the auxiliary semantic information and features into the prediction. The channel attention branch aims to learn global features and inter-channel relationship; and the spatial attention branch aims to learn local features and inter-spatial relationship. Furthermore, in order to capture intra-class coherency of attention, we propose to exploit co-attention mechanism to measure similarity between a pair of images, which are masked with attention maps, by the feature extractor. Experimental results on Broden+ and Places365 datasets show that the proposed method requires only a small number of parameters in the attention modules (1.3 MB) and achieves the state-of-the-art or comparable performance when using the auxiliary semantic information.
中文摘要 I
Abstract II
1. Introduction 1
2. Related Work 5
2.1 Without Auxiliary Information 5
2.2 With Auxiliary Information 6
3. Proposed Method 8
3.1 Motivation 8
3.2 Overview of Semantic-Aware Attention 8
3.3 Channel Attention 9
3.4 Spatial Attention 10
3.5 Integration of SAAM and Scene Classification Backbone Network 11
3.6 Co-attention 12
3.7 Intra-class Co-attention Loss 15
3.8 Inter-class Co-attention Loss 16
3.9 Alternative Training Procedure 17
4. Experiments 19
4.1 Datasets 19
4.2 Evaluation Scheme 20
4.3 Implementation Details 20
4.4 Results 21
5. Conclusion 30
References 31
[1] J. Park, S. Woo, J.Y. Lee, and I.S. Kweon, “Bam: Bottleneck attention module,” in Proc. BMVC, 2018.
[2] S. Woo, J. Park, J.Y. Lee, and I.S. Kweon, “Cbam: Convolutional block attention module,” in Proc. ECCV, 2018.
[3] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 million image database for scene recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452-1464, 2017.
[4] Y. Chen, X. Jin, B. Kang, J. Feng, and S. Yan, “Sharing residual units through collective tensor factorization in deep neural networks,” in Proc. IJCAI, 2018.
[5] Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, and J. Feng, “Dual path networks,” in Proc. NIPS, 2017.
[6] T. He, Z. Zhang, H. Zhang, Z. Zhang, J. Xie, and M. Li, “Bag of tricks for image classification with convolutional neural networks.” in Proc. CVPR, 2019.
[7] D. Mahajan, R. Girshick, V. Ramanathan, K. He, M. Paluri, Y. Li, A. Bharambe, and L. van der Maaten, “Exploring the limits of weakly supervised pretraining,” in Proc. ECCV, 2018.
[8] Y. Yuan and J. Wang, “Ocnet: Object context network for scene parsing,” arXiv:1809.00916, 2018.
[9] J. Fu, J. Liu, H. Tian, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” in Proc. CVPR, 2019.
[10] T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun, “Unified perceptual parsing for scene understanding,” in Proc. ECCV, 2018.
[11] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. CVPR, 2016.
[12] R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proc. ICCV, 2017.
[13] K.J. Hsu, Y.Y. Lin, and Y.Y. Chuang, “Co-attention cnns for unsupervised object co-segmentation,” in Proc. IJCAI, 2018.
[14] Z. Dai, M. Chen, S. Zhu, and P. Tan, “Batch feature erasing for person re-identification and beyond,” arXiv: 1811.07130, 2018.
[15] Y. Duan, J. Lu, and J. Zhou, “UniformFace: Learning deep equidistributed representation for face recognition,” in Proc. CVPR, 2019.
[16] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, “Learning deep features for scene recognition using places database,” in Proc. NIPS, 2014.
[17] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. CVPR, 2009.
[18] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, 86(11):2278-2324, 1998.
[19] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi, “Describing textures in the wild,” in Proc. CVPR, 2014.
[20] M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in Proc. ICML, 2019.
[21] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. CVPR, 2018.
[22] Y. Cao, J. Xu, S. Lin, F. Wei, and H. Hu, “Gcnet: Non-local networks meet squeeze-excitation networks and beyond,” arXiv: 1904.11492, 2019.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *