基於區域通道注意力之多專家網路架構以應用於高度不均衡影像分類_

帳號：guest(18.221.76.65) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	張君豪
作者(外文):	Chang, Chun-Hao
論文名稱(中文):	基於區域通道注意力之多專家網路架構以應用於高度不均衡影像分類
論文名稱(外文):	ReCAME-Net: A Regional Channel Attention-based Multi-Expert Net for Highly Imbalanced Image Classification
指導教授(中文):	林嘉文邵皓強
指導教授(外文):	Lin, Chia-Wen Shao, Hao-Chiang
口試委員(中文):	陳聿廣方劭云
口試委員(外文):	Chen, Yu-Guang Fang, Shao-Yun
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	電機工程學系
學號:	110061529
出版年(民國):	113
畢業學年度:	112
語文別:	英文
論文頁數:	31
中文關鍵詞:	高度不均衡影像分類、長尾視覺識別、關注模組、度量學習、多專家架構
外文關鍵詞:	Highly imbalanced image classification、Long-tailed visual recognition、Attention module、Metric learning、Multi-expert framework
相關次數:	推薦:0 點閱:0 評分: 下載:9 收藏:0

高度不均衡的影像分類是希望在各類數量落差極大的情況下，
做好影像識別的工作。現存的方法與資料集中, 多數學者會以長尾
分布的觀點來看待此問題; 但這些長尾分布的資料集都是經過人為
處理取樣的, 這意味著可以過濾掉某些資料特性, 因此用這些資料
集去模擬工業界、生物學界又或是醫學界的資料不平衡是不切實
際的。本論文從聯華電子的資料進行分析, 觀察到未經人工處理的
真實世界高度不均衡資料會有以下幾個特性: 1) multi-cluster head
classes, 2) 同類間多樣性高, 3) 異類間具有高相似度, 這些特性是傳
統長尾資料從未考慮與出現的, 也導致過往的長尾方法會失效。

受到聯電資料特性的啟發, 我們提出一個基於區域通道注意力
的多專家網路架構, 來解決真實世界中高度不均衡的資料所帶來的
特性與挑戰。此網路配有” 局部通道注意” 與” 度量學習損失”。實
驗證明我們的方法在聯電資料集上有良好的表現, 且在傳統長尾資
料集(ImageNet-LT、iNaturalist 2018) 上也有不錯的效果, 後續實驗
也證明了我們的模型中各模組與損失函數的重要性。

Highly imbalanced visual recognition aims to effectively perform image recognition tasks even when there is a large imbalance ratio (n_max/n_min) across different classes. In existing methods and datasets, most researchers address this problem from the perspective of long-tailed distribution. However, these long-tailed datasets are artificially sampled, meaning that certain data characteristics can be filtered out intentionally. Therefore, using them to simulate imbalanced datasets in industries, biology, or medicine is unrealistic. This paper analyzes the UMC (United Microelectronics Corporation) dataset, and observes that unprocessed real-world highly imbalanced data typically exhibits the following properties: 1) multi-cluster head classes, 2) large intra-class diversity, and 3) high inter-class similarity. These characteristics have not been considered and are not present in conventional long-tailed datasets, leading to the failure of existing long-tailed methods.

Inspired by the characteristics of the UMC data, we propose a regional channel attention-based multi-expert to address the properties and challenges posed by real-world highly imbalanced datasets. This network equipped with regional channel attention and metric learning losses. Experiments show that our approach has excellent results on the UMC dataset, and it also performs well on traditional long-tailed datasets like ImageNet-LT and iNaturalist 2018. Extensive experiments further confirm the significance of the modules and loss functions within our model.

摘要 i
Abstract ii
1 Introduction 1
2 Related Work 5
2.1 Existing Long-tailed Visual Recognition . . . . . . . . . . . . . . . . . . . . . 5
2.2 Attention mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Knowledge distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Proposed Method 9
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Parallel independent learning (PIL) . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.2 Feature extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.3 Attraction-Repulsion-Balanced (ARB) Loss . . . . . . . . . . . . . . 13
3.2.4 Hard Category Mining (HCM) Loss . . . . . . . . . . . . . . . . . . . 14
3.2.5 Contrastive Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.6 Center Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Knowledge Distillation (KD) Loss . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Total Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Experiments 18
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3.1 Open Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3.2 Highly imbalanced dataset (UMC dataset) . . . . . . . . . . . . . . . . 21
4.3.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5 Conclusion 27
References 28

[1] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[2] Z. Liu, Z. Miao, X. Zhan, J. Wang, B. Gong, and S. X. Yu, “Large-scale long-tailed recognition in an open world,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2537–2546, 2019.
[3] G. Van Horn, O. Mac Aodha, Y. Song, Y. Cui, C. Sun, A. Shepard, H. Adam, P. Perona, and S. Belongie, “The inaturalist species classification and detection dataset,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8769–8778, 2018.
[4] C. Hou, J. Zhang, H. Wang, and T. Zhou, “Subclass-balancing contrastive learning for long-tailed recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5395–5407, 2023.
[5] C. Huang, Y. Li, C. C. Loy, and X. Tang, “Learning deep representation for imbalanced classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5375–5384, 2016.
[6] C. Park, J. Yim, and E. Jun, “Mutual learning for long-tailed recognition,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2675–2684, 2023.
[7] Y. Jin, M. Li, Y. Lu, Y.-m. Cheung, and H. Wang, “Long-tailed visual recognition via selfheterogeneous integration with knowledge excavation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pp. 23695–23704, 2023.
[8] J. Li, Z. Tan, J. Wan, Z. Lei, and G. Guo, “Nested collaborative learning for long-tailed visual recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pp. 6949–6958, 2022.
[9] J. Ren, C. Yu, X. Ma, H. Zhao, S. Yi, et al., “Balanced meta-softmax for long-tailed visual recognition,” Advances in neural information processing systems, vol. 33, pp. 4175–4186, 2020.
[10] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
[11] Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu, “Deep mutual learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4320–4328, 2018.
[12] S. Zhang, C. Chen, X. Hu, and S. Peng, “Balanced knowledge distillation for long-tailed learning,” Neurocomputing, vol. 527, pp. 36–46, 2023.
[13] Y.-Y. He, J. Wu, and X.-S. Wei, “Distilling virtual examples for long-tailed recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 235–244, 2021.
[14] H. Dang, T. Nguyen, T. Tran, H. Tran, and N. Ho, “Neural collapse in deep linear network: From balanced to imbalanced data,” arXiv preprint arXiv:2301.00437, 2023.
[15] L. Xie, Y. Yang, D. Cai, and X. He, “Neural collapse inspired attraction–repulsionbalanced loss for imbalanced learning,” Neurocomputing, vol. 527, pp. 60–70, 2023.
[16] Y. Yang, S. Chen, X. Li, L. Xie, Z. Lin, and D. Tao, “Inducing neural collapse in imbalanced learning: Do we really need a learnable classifier at the end of deep neural network?,” Advances in Neural Information Processing Systems, vol. 35, pp. 37991–38002, 2022.
[17] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009.
[18] S. Zhang, Z. Li, S. Yan, X. He, and J. Sun, “Distribution alignment: A unified framework for long-tail visual recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2361–2370, 2021.
[19] J. Cui, Z. Zhong, S. Liu, B. Yu, and J. Jia, “Parametric contrastive learning,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 715–724, 2021.
[20] E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, “Randaugment: Practical automated data augmentation with a reduced search space,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 702–703, 2020.
[21] X. Wang, L. Lian, Z. Miao, Z. Liu, and S. X. Yu, “Long-tailed recognition by routing diverse distribution-aware experts,” arXiv preprint arXiv:2010.01809, 2020.
[22] J. Cai, Y. Wang, and J.-N. Hwang, “Ace: Ally complementary experts for solving longtailed recognition in one-shot,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 112–121, 2021.
[23] Q. Zhao, C. Jiang, W. Hu, F. Zhang, and J. Liu, “Mdcs: More diverse experts with consistency self-distillation for long-tailed recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11597–11608, October 2023.
[24] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018.
[25] T. Li, P. Cao, Y. Yuan, L. Fan, Y. Yang, R. S. Feris, P. Indyk, and D. Katabi, “Targeted supervised contrastive learning for long-tailed recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6918–6928, 2022.
[26] M. Li, Y.-M. Cheung, and J. Jiang, “Feature-balanced loss for long-tailed visual recognition,” in 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, IEEE, 2022.
[27] K. Tang, M. Tao, J. Qi, Z. Liu, and H. Zhang, “Invariant feature learning for generalized long-tailed classification,” in European Conference on Computer Vision, pp. 709–726, Springer, 2022.
[28] E. S. Aimar, A. Jonnarth, M. Felsberg, and M. Kuhlmann, “Balanced product of calibrated experts for long-tailed recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19967–19977, 2023.
[29] Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9268–9277, 2019.
[30] M. Li, Y.-m. Cheung, and Y. Lu, “Long-tailed visual recognition via gaussian clouded logit adjustment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6929–6938, 2022.
[31] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017.
[32] M. Buda, A. Maki, and M. A. Mazurowski, “A systematic study of the class imbalance problem in convolutional neural networks,” Neural networks, vol. 106, pp. 249–259, 2018.
[33] J. Wang, W. Zhang, Y. Zang, Y. Cao, J. Pang, T. Gong, K. Chen, Z. Liu, C. C. Loy, and D. Lin, “Seesaw loss for long-tailed instance segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9695–9704, 2021.
[34] K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma, “Learning imbalanced datasets with label-distribution-aware margin loss,” Advances in neural information processing systems, vol. 32, 2019.
[35] Z. Zhong, J. Cui, S. Liu, and J. Jia, “Improving calibration for long-tailed recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16489–16498, 2021.
[36] B. Zhou, Q. Cui, X.-S. Wei, and Z.-M. Chen, “Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9719–9728, 2020.
[37] B. Kang, S. Xie, M. Rohrbach, Z. Yan, A. Gordo, J. Feng, and Y. Kalantidis, “Decoupling representation and classifier for long-tailed recognition,” arXiv preprint arXiv:1910.09217, 2019.
[38] L. Xiang, G. Ding, and J. Han, “Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 247–263, Springer, 2020.
[39] Y. Zhang, B. Hooi, L. Hong, and J. Feng, “Self-supervised aggregation of diverse experts for test-agnostic long-tailed recognition,” Advances in Neural Information Processing Systems, vol. 35, pp. 34077–34090, 2022.
[40] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018.
[41] J. Park, S. Woo, J.-Y. Lee, and I. S. Kweon, “Bam: Bottleneck attention module,” arXiv preprint arXiv:1807.06514, 2018.
[42] H. Wang, Y. Fan, Z. Wang, L. Jiao, and B. Schiele, “Parameter-free spatial attention network for person re-identification,” arXiv preprint arXiv:1811.12150, 2018.
[43] C. Buciluǎ, R. Caruana, and A. Niculescu-Mizil, “Model compression,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 535–541, 2006.
[44] T. Furlanello, Z. Lipton, M. Tschannen, L. Itti, and A. Anandkumar, “Born again neural networks,” in International Conference on Machine Learning, pp. 1607–1616, PMLR, 2018.
[45] S. I. Mirzadeh, M. Farajtabar, A. Li, N. Levine, A. Matsukawa, and H. Ghasemzadeh, “Improved knowledge distillation via teacher assistant,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 5191–5198, 2020.
[46] Q. Guo, X. Wang, Y. Wu, Z. Yu, D. Liang, X. Hu, and P. Luo, “Online knowledge distillation via collaborative learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11020–11029, 2020.
[47] C. Yang, Z. An, H. Zhou, F. Zhuang, Y. Xu, and Q. Zhang, “Online knowledge distillation via mutual contrastive learning for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
[48] R. Anil, G. Pereyra, A. Passos, R. Ormandi, G. E. Dahl, and G. E. Hinton, “Large scale distributed neural network training through online distillation,” arXiv preprint arXiv:1804.03235, 2018.

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文