帳號:guest(3.145.35.194)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):林家瑜
作者(外文):Lin, Jia Yu
論文名稱(中文):透過人臉區域校正與階梯式深度網路的人臉特徵點偵測
論文名稱(外文):Facial Landmark Detection with Face Region Rectification and Deep Network Cascade
指導教授(中文):許秋婷
指導教授(外文):Hsu, Chiou Ting
口試委員(中文):孫民
簡仁宗
口試委員(外文):Sun, Min
Chien, Jen Tzung
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:102062544
出版年(民國):104
畢業學年度:103
語文別:英文中文
論文頁數:35
中文關鍵詞:人臉特徵點偵測深度學習卷積類神經網路階梯式
外文關鍵詞:facial landmark detectiondeep learningconvolutional neural networkcascade
相關次數:
  • 推薦推薦:0
  • 點閱點閱:965
  • 評分評分:*****
  • 下載下載:26
  • 收藏收藏:0
人臉特徵點偵測在近幾年被廣泛的研究且在控制環境下有很好的結果,然而在實際情況下拍攝的影像,容易受到外在因素的影響(如人臉影像受光影變化、部分遮蔽、解析度不足、表情變化或是角度偏差),而偵測的準確率便會急遽的下降。相關的方法常需要在特徵點偵測前,預測人臉區域的位置,也容易受到人臉偵測的準確率影響。這篇論文的目標是想要在環境因素的影響下偵測特徵點,並且不受人臉偵測準確率的影響。所以我們提出一個兩階段式深度網路,用於實作漸進式的人臉特徵點偵測,第一步先得到特徵點粗略的位置,第二步則基於局部區塊作調整,我們也將網路中的每一層加入多任務學習,以接納更多人臉資訊幫助特徵點偵測的準確率。為處理不準確的人臉偵測,我們更提出一個卷積類神經網路用於校正人臉區域。實驗證明我們的方法在AFLW和AFW兩個資料庫上,利用了較少的模型數量得到更佳的結果。
Facial landmark detection has been studied in recent years and has achieved good performance in controlled environment. However, the performance decreases significantly when face images are taken under wild conditions (e.g., different illuminations, occlusions, resolution and with different expressions and pose variations). Moreover, many methods need to determine face region before landmark detection. Therefore, the performance is affected by the accuracy of face detectors. The purpose of this work is to tackle the influence of environmental variations and ensure the detection accuracy even with instable face detectors. Therefore, we propose a two-level deep network to implement coarse-to-fine estimation. The first level predicts rough locations and the second level locally refines the results. We also adopt the multi-task learning into each level to include more information from face. Furthermore, we propose a CNN model to rectify inaccurate face region. Experimental results show that our approach uses fewer models to get more accurate results on AFLW and AFW datasets.
中文摘要 ..... 1
Abstract ..... 2
1. Introduction ..... 4
2. Related Work ..... 6
2.1 Convolutional Neural Network..... 6
2.2 CNN-based Approaches ..... 7
2.3 Discussion ..... 9
3. Proposed Method ..... 10
3.1 Overview ..... 10
3.2 CNN for Face Region Rectification ..... 11
3.2.1 Network Structure ..... 12
3.3 Two-level Cascade for Facial Landmark Detection ..... 12
3.3.1 Multi-Task CNN for Rough Prediction..... 13
3.3.2 Multi-Task CNN for Refinement..... 14
3.3.3 Testing Strategy ..... 16
3.4 Implementation Details ..... 16
3.4.1 Layer-wise Structure ..... . 16
3.4.2 Training ..... 19
4. Experimental Results ..... 22
4.1 Database and Setting ..... 22
4.2 Evaluation of Face Region Rectification ..... 23
4.3 Evaluation of Refinement Strategy ..... 24
4.4 Comparison with State-of-the-art Methods ..... 26
4.5 Extension to Dense Landmark Annotations..... 29
5. Discussion ..... 30
6. Conclusions ..... 32
7. References ..... 33
[1]P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar, “Localizing parts of faces using a consensus of exemplars,” In Proc. CVPR, 2011.
[2]L. Gu and T. Kanade, “A generative shape regularization model for robust face alignment,” In Proc. ECCV, 2008.
[3]X. Zhu and D. Ramanan, “Face detection, pose estimation, and landmark localization in the wild,” In Proc. CVPR, 2012. pp. 2879-2886.
[4]L. Liang, R. Xiao, F. Wen, and J. Sun, “Face alignment via component-based discriminative search,” In Proc. ECCV, 2008.
[5]T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, “Active shape models their training and application,” Computer Vision and Image Understanding, vol. 61, no.1, pp.38-59, Jan. 1995.
[6]T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 6, Jun. 2001, pp. 681-685.
[7]G. Tzimiropoulos and M. Pantic, “Optimization problems for fast AAM fitting in-the-wild,” In Proc. ICCV, 2013.
[8]X. Xiong, F. De La Torre, and X. Tang, “Supervised descent method and its applications to face alignment,” In Proc. CVPR, 2013.
[9]X. Yu, J. Huang, S. Zhang, W. Yan, D.N. Metaxas, “Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model,” In ICCV, 2013
[10]X. Cao, Y. Wei, F. Wen, J. Sun, “Face alignment by explicit shape regression,” In CVPR, 2012
[11]X.P. Burgos-Artizzu, P. Perona, and P. Dollar, “Robust face landmark estimation under occlusion,” In Proc. ICCV, 2013.
[12]S. Ren, X. Cao, Y. Wei, and J. Sun, “Face alignment at 3000 fps via regressing local binary features,” In Proc. CVPR, 2014.
[13]Y. Sun, X. Wang, and X. Tang, “Deep Convolutional Network Cascade for Facial Point Detection,” In Proc. CVPR, 2013.
[14]L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus, “Regularization of Neural Networks using DropConnect,” In Proc. ICML, 2013.
[15]C.Y. Lee, S. Xie, P. W. Gallagher, Z. Zhang, Z. Tu, “Deeply-Supervised Nets,” In arXiv, 2014.
[16]A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” In Proc. NIPS, 2012.
[17]B. Yoshua, “Learning Deep Architectures for AI,” in Foundations and Trends in Machine Learning.
[18]X. L. Zhang, J. Wu, “Denoising deep neural networks based voice activity detection,” in Proc. ICASSP, 2013.
[19]J. Zhang, S. Shan, M. Kan and X. Chen, “Coarse-to-Fine Auto-encoder Networks (CFAN) for Real-time Face Alignment,” In Proc. ECCV, 2014.
[20]Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Facial Landmark Detection by Deep Multi-task Learning,” In Proc. ECCV, 2014
[21]P. Viola and M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features,” In Proc. CVPR, pp. 511-518, 2001.
[22]K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman, “Return of the Devil in the Details: Delving Deep into Convolutional Nets,” In Proc. BMVC, 2014.
[23]N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, Jan. 2014, pp. 1929-1958.
[24]J. Yangqing, S. Evan, D. Jeff, K. Sergey, L. Jonathan, G. Ross, G. Sergio, D. Trevor, “Caffe: Convolutional Architecture for Fast Feature Embedding,” In arXiv preprint arXiv:1408.5093, 2014.
[25]Y. LeCun, L. Bottou, G. Orr, and K. Muller, “Efficient backprop,” Neural Networks: Tricks of the trade. Springer, 1998
[26]M. Koestinger, P. Wohlhart, P. M. Roth, H. Bischof, “Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization,” In Proc. ICCVW, 2011.
[27]Luxand Incorporated: Luxand face SDK, http://www.luxand.com/
[28]V. Le, J. Brandt, Z. Lin, L. Bourdev, and T. S. Huang, “Interactive facial feature localization,” In Proc. ECCV, 2012.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *