基於人臉關鍵點及時空圖卷積網絡之深偽偵測__國立清華大學博碩士論文全文影像系統

帳號：guest(3.16.203.218) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	游思澤
作者(外文):	Yu, Shih-Ze
論文名稱(中文):	基於人臉關鍵點及時空圖卷積網絡之深偽偵測
論文名稱(外文):	Spatial-Temporal Graph Convolutional Network through Facial Landmarks for DeepFake Detection
指導教授(中文):	林嘉文
指導教授(外文):	Lin, Chia-Wen
口試委員(中文):	林彥宇許志仲陳駿丞
口試委員(外文):	Lin, Yen-Yu Hsu, Chih-Chung Chen, Jun-Cheng
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	電機工程學系
學號:	109061541
出版年(民國):	112
畢業學年度:	111
語文別:	英文
論文頁數:	35
中文關鍵詞:	深偽偵測、時空圖卷積網絡、人臉關鍵點
外文關鍵詞:	DeepFake Detection、Spatial-Temporal Graph Convolutional Network、Landmarks
相關次數:	推薦:0 點閱:324 評分: 下載:0 收藏:0

深度偽造是利用深度學習技術將影片或照片中的來源人臉移植到目標人臉的一種偽造方法。在現今，這些偽造的結果已經在社會上造成了層出不窮的問題，例如:侵犯版權、散播虛假信息引起公眾恐慌、製作非法色情視頻等嚴重問題。因此，有效偵測出深偽結果成為了一個亟待解決的公共問題。近期一種新穎的方法被提出，與以往基於像素特徵的方法相比，它使用人臉關鍵點作為輸入，並且以其結果展示人臉關鍵點在深度偽造偵測中的性能和潛力。受到此啟發，為了進一步挖掘隱藏在人臉關鍵點之間空間域和時間域中的更多線索，我們首先使用德勞內三角剖分建立人臉關鍵點之間的聯繫，並且構建人臉時空圖序列。為了使模型更加強大和靈活，我們對原始的時空圖卷積網絡進行了修改，加入了注意力機制和可學習的鄰接矩陣，並新設計了適用於深度偽造偵測任務的一種權重劃分策略。
最後，我們的方法在基於人臉關鍵點的方法中達到了最先進的結果。在 Celeb-DF 、 DFD 和 DFDC 資料集上，我們的方法與之前的方法相比， AUC 分別提高了29.2%、33.5%和23%。並且我們的方法依然保留了基於人臉關鍵點的方法的大部分優點，包括：更低的訓練成本、對視頻壓縮具有更高的魯棒性。

DeepFake is a forgery technology that transplants the source face into the target face in the video by Deep Learning. The results of these forgeries have caused endless problems in society, such as copyright infringement, disinformation causing public panic, making illegal pornographic videos, and other serious problems. Therefore, DeepFake Detection has become a public problem to be solved urgently. Recently, a novel method has been proposed. In contrast to previous pixel-based methods, it uses facial landmarks as input, and its results demonstrate the performance and potential of facial landmarks in DeepFake Detection as much as pixel-based methods. Inspired by this, in order to further explore more clues hidden in the spatial domain and time domain between facial landmarks, we use Delaunay triangulation to establish the connection between facial landmarks and construct a spatial-temporal graph sequence of faces. To make the model more powerful and flexible, we modify the original Spatial-Temporal Graph Convolutional Network (ST-GCN), add an attention mechanism and a learnable adjacency matrix, and design a new weight partition strategy that is suitable for DeepFake Detection task.
In the end, our method achieves the-state-of-the-art in landmark-based methods. On the Celeb-DF, DFD, and DFDC datasets, our method improves the AUC scores by 29.2%, 33.5%, and 23%, respectively, compared with the previous method. And our method also retains most of the advantages of landmark-based methods, including lower training costs and higher robustness to video compression.

摘要 i
Abstract ii
1 Introduction 1
2 Related Work 6
2.1 DeepFake Detection . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Pixel-Based Methods . . . . . . . . . . . . . . . . . . 6
2.1.2 Landmark Sequences for DeepFake Detection . . . . . 10
2.2 Graph Convolutional Networks . . . . . . . . . . . . . . . . . 11
3 Methodology 12
3.1 Pipeline Overview . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Face Graph Construction . . . . . . . . . . . . . . . . . . . . 13
3.3 Graph Convolution . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Implementation of ST-GCN . . . . . . . . . . . . . . . . . . . 15
3.5 Partition Strategy for Face . . . . . . . . . . . . . . . . . . . 16
3.6 Learnable Adjacency Matrix and Attention Mechanisms . . . 17
3.7 Implement Details . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Experiments 22
4.1 Datasets & Evaluation Metrics . . . . . . . . . . . . . . . . . 22
4.2 General Evaluation . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Robustness to Video Compression . . . . . . . . . . . . . . . 26
4.4 Training Cost . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5.1 Learnable Adjacency Matrix and Attention Mechanisms 28
4.5.2 Different Link Ways & Partition Mode . . . . . . . . 29
4.5.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . 30
5 Conclusion 32
References 33

[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio,
“Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
[2] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
[3] S. Agarwal, H. Farid, Y. Gu, M. He, K. Nagano, and H. Li, “Protecting world leaders against deep fakes.,”
in CVPR workshops, vol. 1, p. 38, 2019.
[4] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, pp. 1251–1258, 2017.
[5] D. Güera and E. J. Delp, “Deepfake video detection using recurrent neural networks,” in 2018 15th IEEE
international conference on advanced video and signal based surveillance (AVSS), pp. 1–6, IEEE, 2018.
[6] D. E. King, “Dlib-ml: A machine learning toolkit,” The Journal of Machine Learning Research, vol. 10,
pp. 1755–1758, 2009.
[7] L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, “Face x-ray for more general face forgery de-
tection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5001–
5010, 2020.
[8] Y. Li, M. Chang, and S. Lyu, “Exposing ai generated fake face videos by detecting eye blinking,” arXiv
preprint arXiv:1806.02877, 2018.
[9] Y. Li and S. Lyu, “Exposing deepfake videos by detecting face warping artifacts,” arXiv preprint
arXiv:1811.00656, 2018.
[10] Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-df: A large-scale challenging dataset for deepfake forensics,”
in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3207–3216,
2020.
[11] F. Matern, C. Riess, and M. Stamminger, “Exploiting visual artifacts to expose deepfakes and face manip-
ulations,” in 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), pp. 83–92, IEEE,
2019.
[12] D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, “Mesonet: a compact facial video forgery detection
network,” in 2018 IEEE international workshop on information forensics and security (WIFS), pp. 1–7, IEEE,
2018.
[13] Y. Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, “Thinking in frequency: Face forgery detection by mining
frequency-aware clues,” in European conference on computer vision, pp. 86–103, Springer, 2020.
[14] Z. Gu, Y. Chen, T. Yao, S. Ding, J. Li, and L. Ma, “Delving into the local: Dynamic inconsistency learning
for deepfake video detection,” AAAI, 2022.
[15] A. Romano, “Jordan peele’s simulated obama psa is a double-edged warning against fake news,” Australasian
Policing, vol. 10, no. 2, pp. 44–45, 2018.
[16] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “Faceforensics++: Learning to
detect manipulated facial images,” in Proceedings of the IEEE/CVF international conference on computer
vision, pp. 1–11, 2019.
[17] E. Sabir, J. Cheng, A. Jaiswal, W. AbdAlmageed, I. Masi, and P. Natarajan, “Recurrent convolutional strate-
gies for face manipulation detection in videos,” Interfaces (GUI), vol. 3, no. 1, pp. 80–87, 2019.
[18] R. Spivak, “”deepfakes”: The newest way to commit one of the oldest crimes,” Geo. L. Tech. Rev., vol. 3,
p. 339, 2018.
[19] X. Yang, Y. Li, and S. Lyu, “Exposing deep fakes using inconsistent head poses,” in ICASSP 2019-2019 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8261–8265, IEEE, 2019.
[20] X. Yang, Y. Li, H. Qi, and S. Lyu, “Exposing gan-synthesized faces using landmark locations,” in Proceedings
of the ACM workshop on information hiding and multimedia security, pp. 113–118, 2019.
[21] B. Dolhansky, R. Howes, B. Pflaum, N. Baram, and C. C. Ferrer, “The deepfake detection challenge (dfdc)
preview dataset,” arXiv preprint arXiv:1910.08854, 2019.
[22] S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action
recognition,” in Thirty-second AAAI conference on artificial intelligence, 2018.
[23] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent
adversarial networks,” in Proceedings of the IEEE international conference on computer vision, pp. 2223–
2232, 2017.
[24] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Stargan: Unified generative adversarial networks
for multi-domain image-to-image translation,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 8789–8797, 2018.
[25] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and J. Ortega-Garcia, “Deepfakes and beyond: A
survey of face manipulation and fake detection,” Information Fusion, vol. 64, pp. 131–148, 2020.
[26] N. Dufour and A. Gully, “Contributing data to deepfake detection research,” Google AI Blog, vol. 1, no. 3,
2019.
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[28] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in Interna-
tional conference on machine learning, pp. 6105–6114, PMLR, 2019.
[29] A. Haliassos, K. Vougioukas, S. Petridis, and M. Pantic, “Lips don’t lie: A generalisable and robust approach
to face forgery detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recog-
nition, pp. 5039–5049, 2021.
[30] Z. Sun, Y. Han, Z. Hua, N. Ruan, and W. Jia, “Improving the efficiency and robustness of deepfakes detection
through precise geometric features,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 3609–3618, 2021.
[31] S. Baker and I. Matthews, “Lucas-kanade 20 years on: A unifying framework,” International journal of
computer vision, vol. 56, no. 3, pp. 221–255, 2004.
[32] R. E. Kalman, “A new approach to linear filtering and prediction problems,” 1960.
[33] M. Uricár, V. Franc, and V. Hlavác, “Facial landmark tracking by tree-based deformable part model based
detector,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 10–17,
2015.
[34] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal
processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,”
IEEE signal processing magazine, vol. 30, no. 3, pp. 83–98, 2013.
[35] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv
preprint arXiv:1609.02907, 2016.
[36] M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks on graph-structured data,” arXiv preprint
arXiv:1506.05163, 2015.
[37] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast local-
ized spectral filtering,” Advances in neural information processing systems, vol. 29, 2016.
[38] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in
neural information processing systems, vol. 30, 2017.
[39] T. Kipf, E. Fetaya, K.-C. Wang, M. Welling, and R. Zemel, “Neural relational inference for interacting sys-
tems,” in International Conference on Machine Learning, pp. 2688–2697, PMLR, 2018.
[40] F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein, “Geometric deep learning on
graphs and manifolds using mixture model cnns,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 5115–5124, 2017.
[41] M. Uricár, V. Franc, and V. Hlavác, “Facial landmark tracking by tree-based deformable part model based
detector,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 10–17,
2015.
[42] H. Golzadeh, D. R. Faria, L. J. Manso, A. Ekárt, and C. D. Buckingham, “Emotion recognition using spa-
tiotemporal features from facial expression landmarks,” in 2018 International Conference on Intelligent Sys-
tems (IS), pp. 789–794, IEEE, 2018.
[43] Q. T. Ngoc, S. Lee, and B. C. Song, “Facial landmark-based emotion recognition via directed graph neural
network,” Electronics, vol. 9, no. 5, p. 764, 2020.
[44] Z. Lu, Z. Luo, H. Zheng, J. Chen, and W. Li, “A delaunay-based temporal coding model for micro-expression
recognition,” in Asian conference on computer vision, pp. 698–711, Springer, 2014.
[45] R. Adyapady R and B. Annappa, “Micro expression recognition using delaunay triangulation and voronoi
tessellation,” IETE Journal of Research, pp. 1–17, 2022.
[46] H. Zhao, W. Zhou, D. Chen, T. Wei, W. Zhang, and N. Yu, “Multi-attentional deepfake detection,” in Pro-
ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2185–2194, 2021.
[47] I. Masi, A. Killekar, R. M. Mascarenhas, S. P. Gurudatt, and W. AbdAlmageed, “Two-branch recurrent net-
work for isolating deepfakes in videos,” in European conference on computer vision, pp. 667–684, Springer,
2020.
[48] B. Zi, M. Chang, J. Chen, X. Ma, and Y.-G. Jiang, “Wilddeepfake: A challenging real-world dataset for
deepfake detection,” in Proceedings of the 28th ACM international conference on multimedia, pp. 2382–
2390, 2020.
[49] Z. Shang, H. Xie, Z. Zha, L. Yu, Y. Li, and Y. Zhang, “Prrnet: Pixel-region relation network for face forgery
detection,” Pattern Recognition, vol. 116, p. 107950, 2021.
[50] S. Zhang, H. Tong, J. Xu, and R. Maciejewski, “Graph convolutional networks: a comprehensive review,”
Computational Social Networks, vol. 6, no. 1, pp. 1–23, 2019.

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文