帳號:guest(3.149.255.208)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):賴清元
作者(外文):Lai, Ching-Yuan
論文名稱(中文):以兩種輔助學習方法來改善場景文字檢測
論文名稱(外文):Improving Scene Text Detection by Two Auxiliary Learning Methods
指導教授(中文):林嘉文
指導教授(外文):Lin, Chia-Wen
口試委員(中文):林彥宇
許志仲
陳駿丞
口試委員(外文):Lin, Yen-Yu
Hsu, Chih-Chung
Chen, Jun-Cheng
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:109061527
出版年(民國):112
畢業學年度:111
語文別:英文
論文頁數:28
中文關鍵詞:場景文字檢測多任務學習相似性學習困難樣本挖掘
外文關鍵詞:Scene Text DetectionMulti-task LearningSimilarity LearningHard Mining
相關次數:
  • 推薦推薦:0
  • 點閱點閱:336
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
場景文字檢測任務的目標是從複雜的場景中,找出文字的位置,並將它們標示出來。在現存的方法裡,有一些學者認為引入字母檢測任務能提升文字檢測的準確率,並且他們的實驗結果也證實了此作法的有效性,然而,這類方法除了比較耗時之外,也需要相對高的標記成本。受到多任務學習方法的啟發,在本次研究中,我們提出了較低成本的文字相似性學習,不同於其他方法,我們希望藉由文字書寫風格的輔助資訊來改善文字位置的預測。

同時,我們也觀察到過去方法容易將具有文字特徵的圖案誤以為是目標文字。為了緩解這類問題,我們提出了非文字識別學習來幫助檢測模型排除錯誤的預測結果。受益於所提出的兩個輔助任務,我們的方法在Total-Text和CTW1500測試集上,分別超越我們的DBNet基準模型1.4%和1.0%的準確率。此外,由於這兩個輔助學習方法在測試階段皆可以被移除,所以不會降低原先的檢測效率。
Scene text detection involves detecting and labeling texts in an image. Among the existing methods, some scholars believe that introducing the character detection tasks can improve the accuracy of text detection, and their experimental results have confirmed the effectiveness of this approach. However, these methods require higher annotation costs and longer inference time. Inspired by multi-task learning methods, we propose a text-similarity learning method with relatively low labeling costs. Unlike other methods, we improve text prediction by auxiliary information about the text writing style.

Meanwhile, we also observe that most of the segmentation-based methods predict the position of texts based on pixel-level information, so they often misidentify some patterns with text-like features as the target text. To alleviate this problem, we propose a non-text recognition learning method to eliminate some incorrect predictions. Benefiting from our proposed auxiliary learning methods, our method outperforms our baseline, DBNet, by 1.4% and 1.0% in accuracy on the Total-Text and CTW1500 datasets. Furthermore, these two auxiliary tasks can be removed during the testing phase, so introducing our methods does not reduce the original detection efficiency.
1 Introduction 1
1.1 Research Background . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Related Work 4
2.1 Local Modeling Methods . . . . . . . . . . . . . . . . . . . 4
2.2 Global Modeling Methods . . . . . . . . . . . . . . . . . . 5
2.3 Multi-task Learning Methods . . . . . . . . . . . . . . . . 5
2.4 Text-like Region Problem . . . . . . . . . . . . . . . . . 6
3 Proposed Method 7
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Text Detection Module (TDM) . . . . . . . . . . . . . . . . 8
3.3 Text Similarity Module (TSM) . . . . . . . . . . . . . . . 9
3.3.1 Annotation of Writing Style Label . . . . . . . . . . . 10
3.3.2 Extraction of Text Features . . . . . . . . . . . . . . 10
3.3.3 Label Setting for TSM . . . . . . . . . . . . . . . . . 11
3.3.4 Architecture of TSM . . . . . . . . . . . . . . . . . . 12
3.4 Non-text Recognition Module (NRM) . . . . . . . . . . . . 12
3.5 Loss Functions . . . . . . . . . . . . . . . . . . . . . . 14
4 Experiments 16
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Implementation Details . . . . . . . . . . . . . . . . . . 17
4.3 Comparison with Related Methods . . . . . . . . . . . . . 17
4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . 19
4.5 Hyperparameter Tuning . . . . . . . . . . . . . . . . . . 19
4.6 Different Training Strategies . . . . . . . . . . . . . . 20
4.7 Visualization . . . . . . . . . . . . . . . . . . . . . . 21
4.8 Failure Analysis . . . . . . . . . . . . . . . . . . . . 22
4.9 Robustness Analysis . . . . . . . . . . . . . . . . . . . 23
4.9.1 Extraction of Text Features . . . . . . . . . . . . . . 23
4.9.2 Quantitative Results on Total-Text Dataset . . . . . . . 24
4.9.3 Qualitative Results on Total-Text Dataset . . . . . . . . 24
4.10 Generalization Analysis . . . . . . . . . . . . . . . . . . 25
5 Conclusion 26
References 27
[1] Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, and Shuai Shao. Shape robust text detection with progressive scale expansion network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9336–9345, 2019.
[2] Pengyuan Lyu Ruiyu Li Chao Zhou Xiaoyong Shen Jiaya Jia Zhuotao Tian, Michelle Shu. Learning shape-aware embedding for scene text detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[3] Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, and Xiang Bai. Real-time scene text detection with differentiable binarization. In Proc. AAAI, 2020.
[4] Fei Wu Xi Li Fangfang Wang, Yifeng Chen. Textray: Contour-based geometric modeling for arbitrary-shaped scene text detection. In MM ’20: Proceedings of the 28th ACM International Conference on Multimedia, pages 111––119, 2020.
[5] Chunhua Shen Tong He Lianwen Jin Liangwei Wang Yuliang Liu, Hao Chen. Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9809–9818, 2020.
[6] Yiqin Zhu, Jianyong Chen, Lingyu Liang, Zhanghui Kuang, Lianwen Jin, and Wayne Zhang. Fourier contour embedding for arbitrary-shaped text detection. In CVPR, 2021.
[7] Shi-Xue Zhang, Xiaobin Zhu, Chun Yang, Hongfa Wang, and Xu-Cheng Yin. Adaptive boundary proposal network for arbitrary shape text detection. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV, pages 1285–1294, 2021.
[8] Cong Yao Wenhao Wu Xiang Bai Pengyuan Lyu, Minghui Liao. Mask
textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of the European Conference on Computer Vision (ECCV), pages 67–83, 2018.
[9] Linjie Xing, Zhi Tian, Weilin Huang, and Matthew R Scott. Convolutional character networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
[10] Jian Ye, Zhe Chen, Juhua Liu, and Bo Du. Textfusenet: Scene text detection with richer fused features. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 516–522, 2020.
[11] C.-K Chng and C.-S Chan. Total-text: A comprehensive dataset for scene text detection and recognition. In Proc. IAPR Int. Conf. Document Analysis Recog., pages 935–942, 2017.
[12] Wenjie Zhang Xin He Wenhao Wu Cong Yao Shangbang Long, Jiaqiang Ruan. Textsnake: A flexible representation for detecting text of arbitrary shapes. In Proceedings of the European Conference on Computer Vision (ECCV), pages 20–36, 2018.
[13] Fei Yin Xu-Yao Zhang Cheng-Lin Liu Wei Feng, Wenhao He. Textdragon: An end-to-end framework for arbitrary shaped text spotting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9076–9085, 2019.
[14] David Acuna, Huan Ling, Amlan Kar, and Sanja Fidler. Efficient interactive annotation of segmentation datasets with polygon-rnn++. In CVPR, 2018.
[15] Huan Ling, Jun Gao, Amlan Kar, Wenzheng Chen, and Sanja Fidler. Fast interactive object annotation with curve-gcn. In CVPR, 2019.
[16] Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, ZhiBo Yang, Tong Lu, Chunhua Shen, and Ping Luo. Ae textspotter: Learning visual and linguistic representation for ambiguous text spotting. In European Conference on Computer Vision (ECCV), 2020.
[17] Bala R. Vatti. A generic solution to polygon clipping. In Communications of the ACM, pages 56–63, 1992.
[18] Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In CVPR, 2019.
[19] Ross Girshick-Kaiming He Piotr Dollár Tsung-Yi Lin, Priya Goyal. Focal loss for dense object detection. In ICCV, 2017.
[20] Yuliang Liu, Lianwen Jin, Shuaitao Zhang, Canjie Luo, and Sheng Zhang. Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition, 90:337–345, 2019.
[21] W. Liu Y. Ma C. Yao, X. Bai and Z. Tu. Detecting texts of arbitrary orientations in natural images. In CVPR, 2012.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *