帳號:guest(18.216.251.190)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳維駿
作者(外文):Chen, Wei-Chun
論文名稱(中文):透過生成式對抗網路進行模型知識萃取
論文名稱(外文):Knowledge Distillation via Generative Adversarial Networks
指導教授(中文):李哲榮
指導教授(外文):Lee, Che-Rung
口試委員(中文):陳煥宗
孫民
口試委員(外文):Chen, Hwann-Tzong
Sun, Min
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系所
學號:106062558
出版年(民國):107
畢業學年度:106
語文別:英文
論文頁數:23
中文關鍵詞:模型壓縮知識萃取生成式對抗網路模仿學習
外文關鍵詞:Model CompressionKnowledge DistillationGenerative Adversarial NetworksMimic Learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:184
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
關於深度學習架構計算資源與延遲減少的模型簡化(model reduction)問題已有逐漸增加的研究,因其對於模型發展的重要性,其中一種有發展性的方法是知識萃取(Knowledge Distillation),其創造一種可快速執行的學生模型模仿一個較大的教師網路。在此篇論文,我們提出一種方法,透過生成式對抗網路進行模型知識萃取,稱作KDGAN,藉由模仿教師網路的特徵圖(feature maps),提升了知識萃取的效果,共用分類器(classifier)與生成式對抗網路(generative adversarial network)是KDGAN的兩個主要的方法。實驗結果顯示,KDGAN可以使用一個四層的卷積神經網路(CNN)模仿DenseNet-40,並且可以使用MobileNet模仿DenseNet-100,與其教師模型比較,在CIFAR-100資料集下,兩者學生網路僅有不到1%的精準度損失,在預測時間上比起其教師模型有二至六倍的加速,並且MobileNet的模型大小僅有DensetNet-100的不到一半。
The model reduction problem that eases the computation costs and latency of complex deep learning architectures has received an increasing number of investigations owing to its importance in model deployment. One promising method is knowledge distillation (KD), which creates a fast-to-execute student model to mimic a large teacher network. In this paper, we propose a method, called KDGAN (Knowledge Distillation via Generative Adversarial Networks), which improves the effectiveness of KD by learning the feature maps from the teacher network. Two major techniques used in KDGAN are shared classifier and generative adversarial network. Experimental results show that KDGAN can use a four layers CNN to mimic DenseNet-40 and use MobileNet to mimic DenseNet-100. Both student networks have less than 1% accuracy loss comparing to their teacher models for CIFAR-100 datasets. The student networks are 2-6 times faster than their teacher models for inference, and the model size of MobileNet is less than half of DenseNet-100's.
Chinese Abstract . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Contents . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . vi
List of Algorithms . . . . . . . . . . . . . . . . . viii
1. Introduction . . . . . . . . . . . . . . . . . . . . 1
2. Related Work . . . . . . . . . . . . . . . . . . . 4
3. The Design of KDGAN . . . . . . . . . . . . 7
4. Experiments . . . . . . . . . . . . . . . . . . . 10
5. Conclusion and Future Work . . . . . . 19
[1] Jimmy Ba and Rich Caruana. Do deep nets really need to be deep? In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2654–2662. Curran Associates, Inc., 2014.
[2] Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. End to end learning for self-driving cars. CoRR, abs/1604.07316, 2016.
[3] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan: Interpretable representation learning by information maximiz- ing generative adversarial nets. CoRR, abs/1606.03657, 2016.
[4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
[5] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR, abs/1311.2524, 2013.
[6] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adver- sarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014.
[7] Babak Hassibi and David G. Stork. Second order derivatives for network prun- ing: Optimal brain surgeon. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 164–171. Morgan-Kaufmann, 1993.
[8] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recogni- tion. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, June 2016.
[9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
[10] G. Hinton, O. Vinyals, and J. Dean. Distilling the Knowledge in a Neural Network. ArXiv e-prints, March 2015.
[11] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.
[12] G. Huang, Z. Liu, L. v. d. Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269, July 2017.
[13] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low preci- sion weights and activations. CoRR, abs/1609.07061, 2016.
[14] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167, 2015.
[15] Illarion Khlestov. vision networks. https://github.com/ikhlestov/vision_ networks, 2017.
[16] A Krizhevsky and G Hinton. Learning multiple layers of features from tiny images. 1, 01 2009.
[17] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classifi- cation with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12, pages 1097–1105, USA, 2012. Curran Associates Inc.
[18] C. Ledig, L. Theis, F. Huszr, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-realistic single image super-resolution using a generative adversarial network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 105– 114, July 2017.
[19] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley. Least squares generative adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2813–2821, Oct 2017.
[20] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. CoRR, abs/1411.1784, 2014.
[21] M. Mizotin, J. Benois-Pineau, M. Allard, and G. Catheline. Feature-based brain mri retrieval for alzheimer disease diagnosis. In 2012 19th IEEE International Conference on Image Processing, pages 1241–1244, Sept 2012.
[22] Vinod Nair and Geoffrey E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, pages 807–814, USA, 2010. Omnipress.
[23] A. Odena. Semi-Supervised Learning with Generative Adversarial Networks. ArXiv e-prints, June 2016.
[24] Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier GANs. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learn- ing, volume 70 of Proceedings of Machine Learning Research, pages 2642–2651, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.
[25] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. Generative adversarial text to image synthesis. In Maria Flo- rina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd Inter- national Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1060–1069, New York, New York, USA, 20–22 Jun 2016. PMLR.
[26] Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. In In Proceedings of ICLR, 2015.
[27] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Rad- ford, Xi Chen, and Xi Chen. Improved techniques for training gans. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Ad- vances in Neural Information Processing Systems 29, pages 2234–2242. Curran Associates, Inc., 2016.
[28] Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. CoRR, abs/1801.04381, 2018.
[29] Zehao Shi. Mobilenet. https://github.com/Zehaos/MobileNet, 2017.
[30] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for
large-scale image recognition. CoRR, abs/1409.1556, 2014.
[31] R Teti and Soundar Kumara. Intelligent computing methods for manufacturing
systems. 46:629–652, 12 1997.
[32] G. Urban, K. J. Geras, S. Ebrahimi Kahou, O. Aslan, S. Wang, R. Caruana, A. Mohamed, M. Philipose, and M. Richardson. Do Deep Convolutional Nets Really Need to be Deep and Convolutional? ArXiv e-prints, March 2016.
[33] Zheng Xu, Yen-Chang Hsu, and Jiawei Huang. Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks, 2018.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *