作者(外文):Chuang, Chu-Chun
論文名稱(外文):Multi-Task Framework for Generalized Face Anti-Spoofing with One-Side Meta Triplet Loss
指導教授(外文):Lai, Shang-Hong
口試委員(外文):Lin, Chia-Wen
Hsu, Chiu-Ting
Huang, Szu-Hao
外文關鍵詞:Computer visionDeep LearningFace anti-spoofingMulti-taskMeta learningDomain generalization
新穎的多任務架構應用於泛化人臉防偽辨識,方法中包括三個任務: 深度預
的架構包括一個特徵提取器、一個深度預估器、一個基於U-net 的人臉解析
器、以及一個元學習器負責元學習和分類器。而本文提出的基於U-net 的人
參考的方法,AUC 進步超過6%,而比起過去的方法,HTER 也有相當的進
Due to increasing variations of presentation attacks, model generalization becomes
an essential challenge for face anti-spoofing. Many previous works could not perform
well in generalization. This paper improves the generalization ability of face
anti-spoofing with two aspects. First, employing the face parsing information encourages
the network to focus on face regions and realizes distributions between
different face parts. Second, one-side triplet loss is adopted into the network to cooperate
with the meta learning process. This paper proposes a novel multi-task face
anti-spoofing framework that contains three tasks: depth estimation, face parsing,
and live/spoof classification. With the pixel-wise supervision from the face parsing
and depth estimation tasks, the regularized features can better distinguish spoof
faces. While simulating domain shift with meta learning techniques, the proposed
one-side triplet loss can further improve the generalization capability by a two-stage
margin setting. Our framework consists of a feature extractor, a depth estimator, a
U-net based face parsing module, and a meta learner for conducting meta learning
and classification. The proposed U-net based face parsing module contains a U-net
for predicting semantic face image and an attention-based skip connection for aggregating
the semantic information of different channels. Extensive experiments on
four public datasets demonstrate that the proposed framework and training strategies
are more effective than previous works for model generalization to unseen domains.
The AUCs are improved by over 6% compared to the baseline for some experiments
on domain generalization benchmark for face anti-spoofing, and the HTER is also
significantly improved over the previous methods.
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Related Work 5
2.1 Face Anti-spoofing . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Temporal-based Methods . . . . . . . . . . . . . . . . . . . 5
2.1.2 Appearance-based Methods . . . . . . . . . . . . . . . . . 6
2.2 Domain Generalization . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Meta Learning for Domain Generalization . . . . . . . . . . 8
3 Proposed Method 9
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Multi-Task Meta Learning . . . . . . . . . . . . . . . . . . . . . . 10
3.3 U-net Based Face Parsing Module . . . . . . . . . . . . . . . . . . 12
3.3.1 Face Parsing U-net . . . . . . . . . . . . . . . . . . . . . . 12
3.3.2 Attention-Based Skip Connection . . . . . . . . . . . . . . 13
3.4 One-Side Triplet loss . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5.1 Classification Loss . . . . . . . . . . . . . . . . . . . . . . 15
3.5.2 One-Side Triplet Loss . . . . . . . . . . . . . . . . . . . . 16
3.5.3 Segmentation Loss . . . . . . . . . . . . . . . . . . . . . . 17
3.5.4 Depth Loss . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.5.5 Overall Loss . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.6 Network Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Experiments 23
4.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.2 Implementation Details . . . . . . . . . . . . . . . . . . . . 25
4.1.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Experimental Comparisons . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Face Parsing Results . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4.1 U-net Based Face Parsing Module . . . . . . . . . . . . . . 29
4.4.2 One-Side Triplet Loss with Meta learning . . . . . . . . . . 30
4.5 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5.1 Grad-CAM Visualization . . . . . . . . . . . . . . . . . . . 30
4.5.2 t-SNE Visualization . . . . . . . . . . . . . . . . . . . . . 31
4.5.3 Effect of Attention-Based Skip Connection for Face Parsing 32
5 Conclusions 37
References 38
