帳號:guest(18.218.8.152)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):李浩榮
作者(外文):Lee, Ho-Weng
論文名稱(中文):基於文本對齊的異常檢測骨幹模型用於工業檢測任務
論文名稱(外文):TAB: Text Align - Anomaly Backbone Model for Industrial Inspection Tasks
指導教授(中文):賴尚宏
指導教授(外文):Lai, Shang-Hong
口試委員(中文):邱瀞德
黃敬群
口試委員(外文):Chiu, Ching-Te
Huang, Ching-Chun
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:110062401
出版年(民國):113
畢業學年度:112
語文別:英文
論文頁數:48
中文關鍵詞:影像處理瑕疵檢測異常檢測
外文關鍵詞:Computer VisionAnomaly Detection
相關次數:
  • 推薦推薦:0
  • 點閱點閱:193
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
在當前的工業領域,異常檢測和異常分割的研究方法已在許多工業數據集上
取得了優異甚至近乎完美的表現。然而,這些方法的核心概念和設計方式大
多依賴於預訓練在 ImageNet 上的主幹模型所產生的特徵,並在其後加入額
外的模型並使用額外的工業正常樣本進行微調。該目的是為了消除原始特徵
空間預訓練 ImageNet 和目標特徵空間工業領域數據之間的偏差。若我們能
夠在前期就獲得具有判別正常和異常樣本的特徵,就能夠大幅度縮小後續模
型的大小,亦能用更少量訓練樣本來進行微調。在本文中,我們提出了一種
全新的預訓練方式,該方式基於多模態模型 CLIP 的文本特徵輔助並重新將
視覺特徵投影到新的特徵空間中。通過文本特徵的協助,該預訓練方式致力
於將正常,異常的視覺特徵與其對應的正常及異常文本特徵進行匹配對齊,
使我們的模型權重產生的視覺特徵能夠含有文本資訊從而能夠應用於工業領
域上進行正常及異常特徵的判別。我們後續在各種工業領域的著名數據集上
進行訓練,實驗結果顯示,我們的預訓練權重皆能夠再讓目前的頂尖方法不
額外增加模型或任何資料也能夠有再進一步的準確度提升,且該高泛化性的
權重也能夠在少樣本訓練上帶來顯著的效益,同時為工業異常檢測研究帶來
貢獻。
In recent years, there has been a growing emphasis on anomaly detection and localization in industrial inspection tasks. While numerous studies have achieved impressive results, they often require an extensive number of training samples or robust features from pre-trained extractors trained on a multi-domain dataset like ImageNet. In this work, we propose a novel framework that utilizes the visual-linguistic CLIP model to efficiently train a pre-trained backbone model that is adapted to the manufacturing domain by simultaneously considering visual- and text-aligned embedding space in normal and abnormal terms. Our pre-trained backbone significantly enhances the performance of previous works for industrial downstream tasks, such as anomaly detection and localization. The accuracy improvement is justified through experiments on several datasets, including MVTecAD, BTAD, KSDD2, and MixedWM38, without requiring additional training data or increasing model complexity. Furthermore, using our pre-trained backbone weights allows previous works to achieve superior performance in few-shot scenarios with less training data. The proposed anomaly backbone provides a foundation model for more efficient and effective anomaly detection and localization.
1. Introduction (Page 1)
1.1 Problem Statement (Page 1)
1.2 Motivation (Page 2)
1.3 Contributions (Page 2)
1.4 Thesis Organization (Page 4)
2. Related Work (Page 5)
2.1 Anomaly Detection Methods (Page 5)
2.2 Pre-trained Models on ImageNet (Page 6)
2.3 Visual-Linguistic Foundation Model - CLIP (Page 7)
3. Proposed Method (Page 8)
3.1 Synthetic Anomaly Samples(SAS) (Page 9)
3.2 Industrial Domain Prompt Association(IDPA) (Page 10)
3.3 Prompt Samples (Page 11)
3.3.1 prompt normal (Page 12)
3.3.2 prompt abnormal (Page 12)
3.3.3 industrial domain prompt template (Page 13)
3.4 Anomaly-Text-Aware Pre-training Strategy (Page 14)
3.5 Similarity Matrix (Page 15)
4. Experiments (Page 19)
4.1 Datasets (Page 19)
4.2 Evaluation Metrics (Page 20)
4.3 Experiment Settings (Page 20)
4.4 Implementation Details (Page 21)
4.5 Performance in Anomaly Detection (Page 22)
4.6 Performance in Anomaly Localization (Page 22)
4.7 Performance in Defect Classification (Page 22)
4.8 Performance in Few-Shot Settings (Page 23)
4.9 Performance in Cross-dataset (Page 23)
4.10 Visualization and Qualitative Results (Page 28)
5. Ablation Study (Page 41)
5.1 Effectiveness of Pre-trained Strategy (Page 41)
5.2 Effectiveness of Synthetic Methods (Page 42)
5.3 Effectiveness of Aligning Text Features (Page 42)
5.4 Prompt Design (Page 43)
6. Conclusions (Page 45)
7. References (Page 46)
[1] Batzner, K., Heckler, L., and König, R. Efficientad: Accurate visual anomaly
detection at millisecond-level latencies. arXiv preprint arXiv:2303.14535
(2023).
[2] Bergmann, P., Fauser, M., Sattlegger, D., and Steger, C. Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
(2019), pp. 9592–9600.
[3] Božič, J., Tabernik, D., and Skočaj, D. Mixed supervision for surface-defect
detection: From weakly to fully supervised learning. Computers in Industry
129 (2021), 103459.
[4] Buerhop-Lutz, C., Deitsch, S., Maier, A., Gallwitz, F., Berger, S., Doll, B.,
Hauch, J., Camus, C., and Brabec, C. J. A benchmark for visual identification
of defective solar cells in electroluminescence imagery. In 35th European PV
Solar Energy Conference and Exhibition (2018), vol. 12871289, pp. 1287–
1289.
[5] Defard, T., Setkov, A., Loesch, A., and Audigier, R. Padim: a patch distribution modeling framework for anomaly detection and localization. In International Conference on Pattern Recognition (2021), Springer, pp. 475–489.
[6] Deitsch, S., Buerhop-Lutz, C., Sovetkin, E., Steland, A., Maier, A., Gallwitz,
F., and Riess, C. Segmentation of photovoltaic module cells in uncalibrated
electroluminescence images. Machine vision and applications 32, 4 (2021),
84.
[7] Deitsch, S., Christlein, V., Berger, S., Buerhop-Lutz, C., Maier, A., Gallwitz,
F., and Riess, C. Automatic classification of defective photovoltaic module
cells in electroluminescence images. Solar Energy 185 (2019), 455–468.
[8] Deng, H., and Li, X. Anomaly detection via reverse distillation from one-class
embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition (2022), pp. 9737–9746.
[9] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A
large-scale hierarchical image database. In 2009 IEEE conference on computer
vision and pattern recognition (2009), Ieee, pp. 248–255.
[10] Gudovskiy, D., Ishizaka, S., and Kozuka, K. Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows.
In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2022), pp. 98–107.
[11] Guo, Y., Jiang, M., Huang, Q., Cheng, Y., and Gong, J. Mldfr: A multilevel
features restoration method based on damaged images for anomaly detection
and localization. IEEE Transactions on Industrial Informatics (2023).
46
[12] Huang, C., Guan, H., Jiang, A., Zhang, Y., Spratling, M., and Wang, Y.-F.
Registration-based few-shot anomaly detection. In European Conference on
Computer Vision (2022), Springer, pp. 303–319.
[13] Jeong, J., Zou, Y., Kim, T., Zhang, D., Ravichandran, A., and Dabeer, O. Winclip: Zero-/few-shot anomaly classification and segmentation. In Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(2023), pp. 19606–19616.
[14] Jezek, S., Jonak, M., Burget, R., Dvorak, P., and Skotak, M. Deep learningbased defect detection of metal parts: evaluating current methods in complex
conditions. In 2021 13th International congress on ultra modern telecommunications and control systems and workshops (ICUMT) (2021), IEEE, pp. 66–71.
[15] Kumar, M. P., and Ashok, D. A multi-level colour thresholding based segmentation approach for improved identification of the defective region in leather
surfaces. Engineering Journal 24, 2 (2020), 101–108.
[16] Kylberg, G. Kylberg texture dataset v. 1.0. http://aiweb.techfak.
uni-bielefeld.de/content/bworld-robot-control-software/, 2011.
[17] Li, D., Liu, D., Shen, Y., Song, Y., and Luo, L. Pcb-aoi. https://www.kaggle.
com/ds/2269497, 2022.
[18] Liu, Z., Zhou, Y., Xu, Y., and Wang, Z. Simplenet: A simple network for
image anomaly detection and localization. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (2023), pp. 20402–
20411.
[19] LTD, A. R. Real-world dataset for anomaly detection. https://adfi.jp/
download/, 2022.
[20] M, P. K., and S, D. A. Leather defect detection and classification. https:
//www.kaggle.com/dsv/2999509, 2022.
[21] Mallikarjuna, P., Targhi, A. T., Fritz, M., Hayman, E., Caputo, B., and
Eklundh, J.-O. The kth-tips2 database. https://www.csc.kth.se/cvap/
databases/kth-tips/index.html, 2006.
[22] Mishra, P., Verk, R., Fornasier, D., Piciarelli, C., and Foresti, G. L. Vt-adl:
A vision transformer network for image anomaly detection and localization.
In 2021 IEEE 30th International Symposium on Industrial Electronics (ISIE)
(2021), IEEE, pp. 01–06.
[23] Moganam, P. K., and Sathia Seelan, D. A. Deep learning and machine learning
neural network approaches for multi class leather texture defect classification
and segmentation. Journal of Leather Science and Engineering 4, 1 (2022), 7.
[24] Moganam, P. K., and Seelan, D. A. S. Perceptron neural network based machine learning approaches for leather defect detection and classification. Instrumentation, Mesures, Métrologies 19, 6 (2020).
47
[25] Pérez, P., Gangnet, M., and Blake, A. Poisson image editing. In Seminal
Graphics Papers: Pushing the Boundaries, Volume 2. 2023, pp. 577–582.
[26] Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry,
G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual models
from natural language supervision. In International conference on machine
learning (2021), PMLR, pp. 8748–8763.
[27] Roth, K., Pemula, L., Zepeda, J., Schölkopf, B., Brox, T., and Gehler, P.
Towards total recall in industrial anomaly detection. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022),
pp. 14318–14328.
[28] Schlüter, H. M., Tan, J., Hou, B., and Kainz, B. Natural synthetic anomalies for
self-supervised anomaly detection and localization. In European Conference
on Computer Vision (2022), Springer, pp. 474–489.
[29] Schuhmann, C., Vencu, R., Beaumont, R., Kaczmarczyk, R., Mullis, C., Katta,
A., Coombes, T., Jitsev, J., and Komatsuzaki, A. Laion-400m: Open dataset
of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114
(2021).
[30] Tianchi. Alcohol bottle defect detection dataset. https://tianchi.aliyun.com/
dataset/dataDetail?dataId=110147, 2021.
[31] Van der Maaten, L., and Hinton, G. Visualizing data using t-sne. Journal of
machine learning research 9, 11 (2008).
[32] Wang, J., Xu, C., Yang, Z., Zhang, J., and Li, X. Deformable convolutional networks for efficient mixed-type wafer defect pattern recognition. IEEE Transactions on Semiconductor Manufacturing 33, 4 (2020), 587–596.
[33] Wieler, M., and Hahn, T. Weakly supervised learning for industrial optical inspection. https://hci.iwr.uni-heidelberg.de/content/
weakly-supervised-learning-industrial-optical-inspection, 2007.
[34] Xie, G., Wang, J., Liu, J., Zheng, F., and Jin, Y. Pushing the limits of
fewshot anomaly detection in industry vision: Graphcore. arXiv preprint
arXiv:2301.12082 (2023).
[35] Yang, Z., Soltani, I., and Darve, E. Anomaly detection with domain adaptation.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (2023), pp. 2957–2966.
[36] Yu, J., Zheng, Y., Wang, X., Li, W., Wu, Y., Zhao, R., and Wu, L. Fastflow:
Unsupervised anomaly detection and localization via 2d normalizing flows.
arXiv preprint arXiv:2111.07677 (2021).
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *