作者(外文):Wei, Wei-Chen
論文名稱(外文):IMCAQ:A Deep Neural Network Quantization Training Method Based on Reparameterization Corresponding to Hardware Limitation of In-Memory Computing
指導教授(外文):Tang, Kea-Tiong
口試委員(外文):Huang, Chao-Tsung
Liu, Ren-Shuo
外文關鍵詞:quantizationcompressionin-memory computingdeep learning
In-memory computing (IMC) exhibits excellent potential for AI accelerator in-volving massive parallel computations and for achieving high energy efficiency. IMC is especially suitable for convolutional neural networks (CNNs), which need to perform large amounts of matrix-vector multiplications (MVMs).
In this work, we propose a quantization method—“IMCQ”, with consideration of the hardware limitations of nonvolatile IMC (nvIMC) to implement compact CNNs. We simulate nvIMC for parallel computation of multilevel MVMs by considering the con-straints of the sense amplifier in nvIMC—more specifically, the need to manage the fol-lowing: the resolution of activations, weights, MVM values and the number of word line in memory cell. Furthermore, we introduce a Concrete distribution based quantiza-tion method to MVM value quantizer in IMCQ, and this method can optimize the small read margin problem caused by variations in nvIMC. We call the advanced meth-od—“IMCAQ”.
We evaluated the performance of proposed quantization methods on the MNIST and CIFAR-10 image classification tasks by LeNet and VGG-Net, respectively. The results showed 2.92% CIFAR-10 accuracy improves compared with traditional straight-through estimator based quantization method under the conditions which are that weights and activations are quantized to 2 bits, MVM values are quantized to 4-bits, and the 9 opened WLs. The results also showed 0.05% MNIST accuracy and 1.31% CIFAR-10 accuracy drop when MVM values are quantized to 4 bits compared with full-precision MVM values. The experimental results indicate that the proposed method is practical and useful for fabricating real chips intended for use in nvIMC platforms.
摘要 i
目錄 iii
圖目錄 v
表目錄 vi
第 1 章 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 6
1.3 章節簡介 9
第 2 章 文獻回顧 10
2.1 模型壓縮演算法 10
2.2 量化神經網路 11
2.2.1 量化方法 12
2.2.2 網路單元量化 15
2.3 非揮發性記憶體內運算 16
2.4 極小讀取邊際問題 20
2.5 研究動機 21
第 3 章 IMCQ:基於記憶體內運算的量化準則 22
3.1 權重量化器 23
3.2 激勵函數量化器 24
3.3 ReRAM卷積 24
3.4 基於STE的MVM量化器 28
第 4 章 IMCAQ:雜訊適應的MVM量化器 31
4.1 感知放大器雜訊模型 31
4.2 重新參數化技巧 32
4.3 基於Concrete分佈的MVM量化器 33
第 5 章 實驗結果 37
5.1 實驗設置 37
5.1.1 實驗數據集及前處理 37
5.1.2 網路架構及超參數設置 38
5.1.3 軟硬體環境 39
5.2 權重及激勵函數量化 40
5.3 MVM訊號量化 42
5.4 ReRAM卷積之影響 43
5.5 IMCAQ與IMCQ之比較 43
5.6 與SOTA之比較 45
第 6 章 結論與未來發展 46
參考文獻 47
