作者(外文):Li, Pin-Yi
論文名稱(外文):Studying the Quantization of Deep Neural Networks through Batch Normalization
指導教授(外文):Tang, Kea-Tiong
口試委員(外文):Lin, Chia-Wen
Huang, Chao-Tsung
外文關鍵詞:Deep neural networksBatch normalizationcosine similarityTeacher-student networkQuantization
由於深度神經網路在應用時需要用到大量的記憶體和計算資源,這對於在資源有限的硬件上部署網路提出了嚴峻的挑戰。因此,越來越多的學者開始投入到減少網絡模型的存儲量及計算開銷以進行有效推斷的研究中。 網絡模型量化是模型壓縮演算法之一,由於其能夠大幅降低記憶體需求同時還能對計算進行簡化,因此飽受關注。
Deep neural networks are notoriously intensive in computation and memory, posing serious challenges for deployment on hardware with limited resources. Driven by this situation, there is an emergent interest in lessening storage and computation overhead of network models for efficient inference. Network quantization is a branch of approaches for model compression, showing promising prospects on memory saving and computational simplification.
In this paper, we studying the quantization of deep neural networks through batch normalization. First, we point out deficiencies of previous works. Then, we modify activation quantization scheme based on batch normalization coefficients. Furthermore, for weight quantization, we propose a method of initializing quantization weights by maximizing cosine similarity. To alleviate gradient mismatch introduced by discrete weights in deep neural networks, we also propose a method that modifies quantized weights by learning the output feature maps generated by the original full precision network layer by layer.
We evaluated the performance of proposed quantization methods on the ImageNet classification task by AlexNet and ResNet-18. The results showed only 0.4% Top-1 accuracy drops when weights and activations are quantized to 4 bits compared with full precision network. By aggressively quantizing weights and activations to 2 bits, the network achieved 54.9% Top-1 accuracy on AlexNet, which shows 3.2% improvement in Top-1 accuracy gap compared to the state-of-the-art method.
摘 要 i
目 錄 iii
圖 目 錄 v
表 目 錄 vi
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 4
1.3 章節簡介 7
第二章 文獻回顧 8
2.1 深度神經網路模型壓縮演算法 8
2.2 權重量化 9
2.2.1 線性量化 10
2.2.2 對數量化 11
2.2.3 基於優化條件量化 11
2.3 激活函數量化 12
2.3.1 線性量化 12
2.3.2 對數量化 13
2.3.3 根據分佈量化 13
第三章 基於批量歸一化量化 15
3.1 批量歸一化 15
3.2 激活函數量化 19
3.2.1 ReLU激活函數 19
3.2.2 前饋近似(Feed-forward approximation) 19
3.2.2 反向傳播近似 21
3.2.3 壓縮率 21
3.3 權重量化 22
3.3.1 量化權重初始化 22
3.3.2 基於教師-學生網路的逐層量化權重調整 24
第四章 實驗結果 28
4.1 實驗設置 28
4.1.1 實驗數據集及前處理 28
4.1.2 網路架構及超參數設置 28
4.1.3 軟硬體環境 29
4.2 激活函數量化 29
4.3 權重量化 32
4.3.1 最大化餘弦相似度 32
4.3.2 教師-學生網絡逐層量化 34
4.4 與世界先進之比較 35
4.4.1 激活函數量化結果比較 36
4.4.2 權重量化結果比較 36
4.4.3 全部網路量化結果比較 37
第五章 結論與未來展望 39
參考文獻 40
