通過批量歸一化研究深度神經網路的量化問題__國立清華大學博碩士論文全文影像系統

帳號：guest(18.218.180.238) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	李品逸
作者(外文):	Li, Pin-Yi
論文名稱(中文):	通過批量歸一化研究深度神經網路的量化問題
論文名稱(外文):	Studying the Quantization of Deep Neural Networks through Batch Normalization
指導教授(中文):	鄭桂忠
指導教授(外文):	Tang, Kea-Tiong
口試委員(中文):	林嘉文黃朝宗
口試委員(外文):	Lin, Chia-Wen Huang, Chao-Tsung
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	電機工程學系
學號:	104061466
出版年(民國):	107
畢業學年度:	106
語文別:	中文
論文頁數:	43
中文關鍵詞:	深度神經網路、批量歸一化、餘弦相似度、教師學生網路、量化
外文關鍵詞:	Deep neural networks、Batch normalization、cosine similarity、Teacher-student network、Quantization
相關次數:	推薦:0 點閱:608 評分: 下載:0 收藏:0

由於深度神經網路在應用時需要用到大量的記憶體和計算資源，這對於在資源有限的硬件上部署網路提出了嚴峻的挑戰。因此，越來越多的學者開始投入到減少網絡模型的存儲量及計算開銷以進行有效推斷的研究中。網絡模型量化是模型壓縮演算法之一，由於其能夠大幅降低記憶體需求同時還能對計算進行簡化，因此飽受關注。
在本研究中，通過批量歸一化來探究深度神經網路的量化問題。首先，根據批量歸一化的特性指出先前量化研究的不足之處。其次，根據批量歸一化的係數來調整激活函數的量化方式。此外，還提出了一個通過最大化餘弦相似度來設置量化權重的方法，並且為了降低梯度失配的問題，使用全精度網路作為教師網路來逐層優化量化網路的輸出特征映射。
本研究通過將提出的方法應用在AlexNet和ResNet-18上進行ImageNet圖像分類來驗證方法的效果。結果表明將權重和激活函數量化到4bit時，在AlexNet上僅降低0.4%的Top-1正確率。進一步將權重和激活函數量化到2bit，在AlexNet上仍保有54.9%的Top-1正確率，性能優於世界先進方法3.2%。

Deep neural networks are notoriously intensive in computation and memory, posing serious challenges for deployment on hardware with limited resources. Driven by this situation, there is an emergent interest in lessening storage and computation overhead of network models for efficient inference. Network quantization is a branch of approaches for model compression, showing promising prospects on memory saving and computational simplification.
In this paper, we studying the quantization of deep neural networks through batch normalization. First, we point out deficiencies of previous works. Then, we modify activation quantization scheme based on batch normalization coefficients. Furthermore, for weight quantization, we propose a method of initializing quantization weights by maximizing cosine similarity. To alleviate gradient mismatch introduced by discrete weights in deep neural networks, we also propose a method that modifies quantized weights by learning the output feature maps generated by the original full precision network layer by layer.
We evaluated the performance of proposed quantization methods on the ImageNet classification task by AlexNet and ResNet-18. The results showed only 0.4% Top-1 accuracy drops when weights and activations are quantized to 4 bits compared with full precision network. By aggressively quantizing weights and activations to 2 bits, the network achieved 54.9% Top-1 accuracy on AlexNet, which shows 3.2% improvement in Top-1 accuracy gap compared to the state-of-the-art method.

摘要 i
ABSTRACT ii
目錄 iii
圖目錄 v
表目錄 vi
第一章緒論 1
1.1 研究背景 1
1.2 研究動機與目的 4
1.3 章節簡介 7
第二章文獻回顧 8
2.1 深度神經網路模型壓縮演算法 8
2.2 權重量化 9
2.2.1 線性量化 10
2.2.2 對數量化 11
2.2.3 基於優化條件量化 11
2.3 激活函數量化 12
2.3.1 線性量化 12
2.3.2 對數量化 13
2.3.3 根據分佈量化 13
第三章基於批量歸一化量化 15
3.1 批量歸一化 15
3.2 激活函數量化 19
3.2.1 ReLU激活函數 19
3.2.2 前饋近似(Feed-forward approximation) 19
3.2.2 反向傳播近似 21
3.2.3 壓縮率 21
3.3 權重量化 22
3.3.1 量化權重初始化 22
3.3.2 基於教師-學生網路的逐層量化權重調整 24
第四章實驗結果 28
4.1 實驗設置 28
4.1.1 實驗數據集及前處理 28
4.1.2 網路架構及超參數設置 28
4.1.3 軟硬體環境 29
4.2 激活函數量化 29
4.3 權重量化 32
4.3.1 最大化餘弦相似度 32
4.3.2 教師-學生網絡逐層量化 34
4.4 與世界先進之比較 35
4.4.1 激活函數量化結果比較 36
4.4.2 權重量化結果比較 36
4.4.3 全部網路量化結果比較 37
第五章結論與未來展望 39
參考文獻 40

[1] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553): 436–444, 2015.
[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[3] Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention-based neural machine translation. In arXiv, 2015.
[4] Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, et al. Deep speech 2: End-to-end speech recognition in english and mandarin. In arXiv, 2015.
[5] Volodymyr Mnih, et al. Human-level control through deep reinforcement learning. Nature, 518(7540): 529, 2015.
[6] Olga Russakovsky, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115.3: 211-252, 2015.
[7] Geoffrey Hinton and Ruslan Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786): 504-507, 2006.
[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
[9] Shouyi Yin, et al. A high energy efficient reconfigurable hybrid neural network processor for deep learning applications. IEEE Journal of Solid-State Circuits, 53(4): 968-982, 2018.
[10] Yu-Hsin Chen, et al. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1): 127-138, 2017.
[11] Mark Horowitz. Computing's energy problem (and what we can do about it). Solid-State Circuits Conference Digest of Technical Papers (ISSCC), IEEE, 2014.
[12] Fengbin Tu, et al. RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM. In ISCA, 2018
[13] Jian Cheng, et al. Recent advances in efficient computation of deep convolutional neural networks. Frontiers of Information Technology & Electronic Engineering, 19(1): 64-77, 2018.
[14] Yu Cheng, et al. Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine, 35(1): 126-136, 2018.
[15] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In ECCV, 2016.
[16] Zhaowei Cai, Xiaodong He, Jian Sun, and Nuno Vasconcelos. Deep learning with low precision by half-wave gaussian quantization. In CVPR, 2017.
[17] Eunhyeok Park, Junwhan Ahn, and Sungjoo Yoo. Weighted-Entropy-based Quantization for Deep Neural Networks. In CVPR, 2017
[18] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
[19] Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. In arXiv, 2013.
[20] Forrest Iandola, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. In arXiv, 2016.
[21] Andrew Howard, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications. In arXiv, 2017.
[22] Mark Sandler, et al. MobileNetV2: inverted residuals and linear bottlenecks. In arXiv, 2018.
[23] Song Han, et al. Learning both Weights and Connections for Efficient Neural Networks. In NIPS, 2015
[24] Wei Wen, et al. Learning Structured Sparsity in Deep Neural Network. In NIPS, 2016.
[25] Jian-Hao Luo, et al. ThiNet-A Filter Level Pruning Method for Deep Neural Network Compression. In ICCV, 2017.
[26] Jianbo Ye, et al. Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers. In ICLR, 2018
[27] Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. Speeding up convolutional neural networks with low rank expansions. In arXiv, 2014
[28] Xiangyu Zhang, et al. Accelerating very deep convolutional networks for classification and detection. IEEE Trans Patt Anal Mach Intell, 38(10):1943-1955, 2015.
[29] Soyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan and Pritish Narayannan. Deep learning with limited numerical precision. In arXiv, 2015.
[30] Darryl Lin, Sachin Talathi, and Sreekanth Annapureddy. Fixed point quantization of deep convolutional networks. In ICML, 2016.
[31] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. In arXiv, 2016.
[32] Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. In arXiv, 2016.
[33] Matthieu Courbariaux, Yoshua Bengio and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. In NIPS, 2015.
[34] Zhaowei Cai, Xiaodong He, Jian Sun, and Nuno Vasconcelos. Deep learning with low precision by half-wave gaussian quantization. In CVPR, 2017
[35] Xiaofan Lin, Cong Zhao, Wei Pan. Towards Accurate Binary Convolutional Neural Network. In NIPS, 2017.
[36] Daisuke Miyashita, Edward H. Lee, and Boris Murmann. Convolutional neural networks using logarithmic data representation. In arXiv, 2016.
[37] Aojun Zhou, et al. Incremental network quantization: Towards lossless cnns with low-precision weights. In ICLR, 2017.
[38] Fengfu Li and Bin Liu. Ternary weight networks. In NIPS Workshop on EMDNN, 2016
[39] Yingpeng Dong, Renkun Ni, Jianguo Li, Yurong Chen, Jun Zhu, and Hang Su. Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization. In BMVC, 2017.
[40] Song Han, Huizi Mao, William Dally. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR, 2016.
[41] Peisong Wang, et al. Two-step quantization for low-bit Neural Networks. In CVPR, 2018
[42] Qinghao Hu, Peisong Wang, and Jian Cheng. From hashing to CNNs: training binary weight networks via hashing. In AAAI, 2018
[43] Chenzhuo Zhu, Song Han, Huizi Mao, and William Dally. Trained Ternary Quantization. In ICLR, 2017
[44] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv and Yoshua Bengio. Binarized neural networks. In NIPS, 2016.
[45] Julian Faraone, et al. SYQ: learning symmetric quantization for efficient deep neural networks. In CVPR, 2018.
[46] Lei Deng, et al. GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework. In arXiv, 2017.
[47] Vinod Nair and Geoffrey Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
[48] Zhuang Liu, et al. Learning efficient convolutional networks through network slimming. In ICCV, 2017.
[49] Nitish Srivastava, et al. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1): 1929-1958, 2014.
[50] Marcel Simon, Erik Rodner, and Joachim Denzler. ImageNet pre-trained models with batch normalization. In arXiv, 2016.
[51] Jiantao Qiu, et al. Going deeper with embedded FPGA platform for convolutional neural network. In ACM International Symposium on FPGA, 2016.

(此全文未開放授權)
電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文