帳號:guest(3.22.77.68)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):李元正
作者(外文):Lee Yuan-Cheng
論文名稱(中文):以深度學習完成基於彩色及深度影像的人臉辨識
論文名稱(外文):Accurate and robust face recognition from RGB-D images with a deep learning approach
指導教授(中文):賴尚宏
指導教授(外文):Lai, Shang-Hong
口試委員(中文):劉庭祿
陳煥宗
莊永裕
口試委員(外文):Liu, Tyng-Lu
Chen ,Huan-Tsong
Chuang ,Yong-Yu
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:103062518
出版年(民國):105
畢業學年度:104
語文別:英文中文
論文頁數:57
中文關鍵詞:臉部辨識深度學習學習遷移3-D 電腦圖學影像處理
外文關鍵詞:Face RecognitionDeep LearningTransfer Learning3-D Computer GraphicsImage Processing
相關次數:
  • 推薦推薦:0
  • 點閱點閱:2420
  • 評分評分:*****
  • 下載下載:300
  • 收藏收藏:0
 在臉部辨識的問題中,深度影像和彩色影像是兩種互補的視覺資料,可呈現兩種不同面向的資訊,而同時使用它們可以得到更準確的辨識結果。在這篇論文中,我們提出一套基於深度學習的臉部辨識系統,用於在消費性RGB-D相機所捕捉的彩色和深度影像上,完成臉部識別和辨認。
 為了同時從彩色和深度影像上獲取資訊以作為辨識之用,我們的系統包含3個主要部分,分別為:深度影像修復、深度學習,用於特徵抽取;以及統合分類,用以整合色彩和深度資訊。
 為了使深度影像獲得與彩色影像相近的辨識效能,我們提出一連串基於影像處理和電腦圖學的方法,拍攝連續多張深度畫格,用以修復和增強深度影像,進一步重建出高品質的臉部模型。利用多視角重新取樣,我們可以模擬單一臉部模型從各種角度拍攝所得的深度影像。
為了消弭RGB-D資料有限對於深度學習的隱患,我們引入了學習遷移的概念。我們的深度網路包含了近年流行的組件與架構,首先在彩色和灰階臉部影像上學習,接著在深度臉部影像上進行微調優化。深度網路主要用來為彩色和深度臉部影像提取具分辨力的深度特徵。我們不僅參考這些特徵,更對每張影像與資料庫中其他的影像的關係進行統計,來設計最後的分類器,以達到更高的準確率和更好的強健性。
 在實驗中,我們已經證明了如此的方法在公開的資料集上能達到非常精確的辨識率,並且對於頭部旋轉和光源變化有很強的容忍力。
Face recognition from RGB-D images utilizes two complementary types of image data, i.e. color and depth images, to achieve more accurate recognition. In this thesis, we propose a face recognition system based on deep learning, which can be used to verify and identify a subject from the color and depth face images captured with a consumer-level RGB-D camera. (e.g., Microsoft Kinect).
To recognize faces with color and depth information, our system contains 3 parts: depth image recovery, deep learning for feature extraction, and joint classification. To gain recognition performance of a depth face image, we propose a series of image processing techniques to recover and enhance a depth image from its neighboring depth frames, thus reconstructing a precise 3D facial model. With multi-view resampling, we can compute the depth images corresponding to various viewing angles of a single 3D face model.
To alleviate the problem of the limited size of available RGB-D data for deep learning, transfer learning is applied. Our deep network architecture contains recently popular components. We first train the deep network on color face dataset, and next fine-tune with depth images for transfer learning. The deep networks are used to extract discriminative feature (deep representation) from color and depth images. Not only these deep representations are taken into consideration, we analyze the relation between each image and the other images in the database, to design our classifier, to reach higher recognition accuracy and better robustness.
Our experiments show that the proposed face recognition system provides very accurate face recognition results on public datasets, and it is robust against variations in head pose and illumination.
Contents
Chapter 1. Introduction 1
1.1 Motivation 1
1.2 Problem Description 2
1.3 Main Contribution 3
Chapter 2. Related Work 4
2.1 Conventional RGB-D face recognition 4
2.2 3-D face reconstruction and recognition 5
2.3 CNN-based Face Recognition 6
2.4 Inspiration from Previous Works 9
Chapter 3. Proposed Method 10
3.1 System Overview 10
3.2 Depth Face Image Recovery and Enhancement 11
3.3 Learning Deep Representation 24
3.4 Confidence Estimation 27
Chapter 4. Experiments 31
4.1 Training and Validation Data Preparation 31
4.2 Training Environment Settings 37
4.3 Network Architecture and Component Evaluation 38
4.4 Distinguish Power of Deep Representation 41
4.5 Experiments on EKFD 44
4.6 Experiments on SuperFace 48
4.7 Experiments on Our Dataset 50
Chapter 5. Conclusion 52
References 53

References

[1] Y. Sun, X. Wang, and X. Tang, “Deep Learning Face Representation from Predicting 10,000 Classes” Computer Vision and Pattern Recognition, 2014.
[2] Y. Sun, X. Wang, and X. Tang, “Deep Learning Face Representation by Joint Identification-Verification” Conference on Neural Information Processing Systems (NIPS), 2014
[3] Y. Sun, X. Wang, and X. Tang, “Deeply learned face representations are sparse, selective, and robust” Computer Vision and Pattern Recognition, 2015.
[4] D. Yi, Z. Lei, S. Liao and S. Z. Li, “Learning Face Representation from Scratch” Computer Vision and Pattern Recognition, 2015.
[5] Y. Sun, D. Liang, X. Wang, and X. Tang, “DeepID3: Face Recognition with Very Deep Neural Networks” arXiv preprint arXiv:1502.00873, 2015.
[6] J. Liu, Y. Deng, T. Bai, Z. Wei, and C. Huang, “Targeting Ultimate Accuracy: Face Recognition via Deep Embedding” arXiv preprint arXiv:1506.07310.
[7] S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification” Computer Vision and Pattern Recognition, 2005.
[8] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: Closing the Gap to Human-Level Performance in Face Verification” Computer Vision and Pattern Recognition, 2014.
[9] A. B. Moreno, and A. Sánchez, “GavabDB: A 3D Face Database," COST Workshop on Biometrics, on the Internet, pp. 75-80, 2004.
[10] A. Mian, “Illumination Invariant Recognition and 3D Reconstruction of Faces using Desktop Optics” Optics Express, vol. 19(8), pp. 7491--7506, 2011.
[11] A. Mian, “Shade Face: Multiple Image based 3D Face Recognition", 3D Digital Imaging and Modeling (3DIM)” Computer Vision-ICCV, 2009.
[12] R. Min, N. Kose and J. Dugelay, “KinectFaceDB: A Kinect Database for Face Recognition” Systems, Man, and Cybernetics: Systems, IEEE Transactions on, vol. 44, no. 11, pp. 1534-1548, November 2014.
[13] S. Berretti, A. Del Bimbo, and P. Pala. "Superfaces: A super-resolution model for 3D faces." Computer Vision–ECCV 2012. Workshops and Demonstrations. Springer Berlin Heidelberg, 2012.
[14] M. Hernandez, J. Choi, and G. Medioni, “Laser Scan Quality 3-D Face Modeling Using a Low-Cost Depth Camera” Signal Processing Conference- EUSIPCO, 2012.
[15] C. Ciaccio, L. Wen, and G. Guo, “Face Recognition Robust to Head Pose Changes Based on the RGB-D Sensor” (IEEE) Biometrics theory applications and systems, 2013.
[16] P. Perez, M. Gangnet, and A. Blake, “Poisson Image Editing” ACM Siggraph, 2003.
[17] V. Nair, and G. E. Hinton. “Rectified linear units improve restricted boltzmann machines” Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 807–814, 2010.
[18] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting” Journal of Machine Learning Research 15, pages 1929-1958, 2014.
[19] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” Advances in Neural Information Processing Systems 27 (NIPS), 2014.
[20] X. Xiong, and F. De la Torre, “Supervised Descent Method and its Application to Face Alignment” Computer Vision and Pattern Recognition, 2013.
[21] K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition” arXiv preprint arXiv:1409.1556, 2014.
[22] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions” arXiv preprint arXiv:1409.4842, 2014
[23] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition” Technical report, arXiv:1409.1556, 2014.
[24] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabi-novich. “Going deeper with convolutions” Computer Vision and Pattern Recognition, 2015.
[25] D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun. “Bayesian face revisited: A joint formulation” Proc. European Conference on Computer Vision, 2012.
[26] L. Wolf and N. Levy. “The SVM-minus similarity score for video face recognition” Computer Vision and Pattern Recognition, 2013.
[27] P. Xiong, L. Huang, and C. Liu. "Real-time 3D face recognition with the integration of depth and intensity images" Image Analysis and Recognition. Springer Berlin Heidelberg, 2011. 222-232.
[28] A. Aissaoui and J. Martinet, “Bi-modal face recognition - How combining 2D and 3D clues can increase the precision” VISAPP, 2015
[29] F. Tsalakanidou, D. Tzovaras, and M. G. Strintzis. "Use of depth and colour eigenfaces for face recognition" Pattern Recognition Letters 24.9 (2003): 1427-1435.
[30] R. Min, et al. "Real-time 3D face identification from a depth camera" International Conference on Pattern Recognition (ICPR), 2012.
[31] G.-S. Hsu, Y.-L. Liu, H.-C. Peng, and P.-X. Wu "RGB-D-based face reconstruction and recognition" IEEE Transactions on Information Forensics and Security, 9.12 (2014): 2110-2118.
[32] J. Choi, A. Sharma, and G. Medioni. "Comparing strategies for 3D face recognition from a 3D sensor" IEEE RO-MAN, 2013.
[33] S. Berretti, P. Pala, and A. Del Bimbo. "Increasing 3D Resolution of Kinect Faces." Computer Vision-ECCV 2014 Workshops. Springer International Publishing, 2014.
[34] S. Berretti, P. Pala, and A. Del Bimbo. "Face Recognition by Super-Resolved 3D Models from Consumer Depth Cameras." Information Forensics and Security, IEEE Transactions on 9.9 (2014): 1436-1449.
[35] S. Gupta, K. R. Castleman, M. K. Markey, and A. C. Bovik, “Texas 3D Face Recognition Database” URL: http://live.ece.utexas.edu/research/texas3dfr/
[36] C. Ciaccio, L. Wen, and G. Guo. "Face recognition robust to head pose changes based on the RGB-D sensor." Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), 2013.
[37] S. Granger, and X. Pennec. "Multi-scale EM-ICP: A fast and robust approach for surface registration." Computer Vision-ECCV 2002 (2006): 69-73.
[38] N. Gelfand, et al. "Geometrically stable sampling for the ICP algorithm." Fourth International Conference on 3-D Digital Imaging and Modeling, , 2003.
[39] J. Chen, et al. "New insights into the noise reduction Wiener filter." IEEE Transactions on Audio, Speech, and Language Processing, 14.4 (2006): 1218-1234.
[40] T. Jost, and H. Hugli. "A multi-resolution ICP with heuristic closest point search for fast and robust 3D registration of range images." 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings. Fourth International Conference on. IEEE, 2003.
[41] S. Rusinkiewicz, and M. Levoy. "Efficient variants of the ICP algorithm." 3-D Digital Imaging and Modeling, 2001. 3DIM 2001 Proceedings. Third International Conference on. IEEE, 2001.
[42] C-C Chang, and C-J Lin. “LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology.” 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
[43] Y. Jia, et al. "Caffe: Convolutional architecture for fast feature embedding." Proceedings of the ACM International Conference on Multimedia. ACM, 2014.
[44] V. Kazemi , et al. “Real-time face reconstruction from a single depth image.” 2014 2nd International Conference on 3D Vision. Vol. 1. IEEE, 2014.
[45] A. Krizhevsky, I Sutskever, and G. E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
[46] K. He, and J. Sun . “Convolutional neural networks at constrained time cost.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[47] D. N. Perkins, and G. Salomon. "Transfer of learning." International Encyclopedia of Education 1992 (2nd ed.). Oxford, UK: Pergamon Press.
[48] K. He, et al. “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.” Proceedings of the IEEE International Conference on Computer Vision. 2015.
[49] G. B. Huang, et al. “Labeled faces in the wild: A database for studying face recognition in unconstrained environments.” Vol. 1. No. 2. Technical Report 07-49, University of Massachusetts, Amherst, 2007.
[50] B. E. Boser, I. M. Guyon, and V. N. Vapnik. “A training algorithm for optimal margin classifiers.” Proceedings of the fifth annual workshop on Computational learning theory. ACM, 1992.
[51] N. Das, D. Mandal, and S. Biswas. “Simultaneous Semi-Coupled Dictionary Learning for Matching RGBD Data.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2016.
[52] J. B. C. Neto, and A. N. Marana. “3DLBP and HAOG fusion for face recognition utilizing Kinect as a 3D scanner.” Proceedings of the 30th Annual ACM Symposium on Applied Computing. ACM, 2015.
[53] M.I. Ouloul, et al. “An Efficient Face Recognition Using SIFT Descriptor in RGB-D Images.” International Journal of Electrical and Computer Engineering 5.6 (2015).
[54] J. Luan “Hybrid Deep Architecture for Pedestrian Detection.” Master’s thesis, National Tsing Hua University, Hsinchu, Taiwan. Full text available at http://handle.ncl.edu.tw/11296/ndltd/12098286776346291243
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *