帳號:guest(3.15.221.10)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):郭皓文
作者(外文):Kuo, Hao-Wen
論文名稱(中文):基於記憶體內運算架構硬體限制的兩階段神經網路訓練框架
論文名稱(外文):A Two-stage Training Framework for Hardware Constraints of Computing-in-Memory Architecture
指導教授(中文):鄭桂忠
指導教授(外文):Tang, Kea-Tiong
口試委員(中文):盧峙丞
張孟凡
呂仁碩
口試委員(外文):Lu, Chih-Cheng
Chang, Meng-Fan
Liu, Ren-Shuo
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:108061535
出版年(民國):111
畢業學年度:110
語文別:中文
論文頁數:55
中文關鍵詞:量化模型壓縮記憶體內運算非理想性效應深度學習
外文關鍵詞:QuantizationModel CompressionComputing In-memoryDeep Learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:436
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
類比記憶體內運算(Analog Computing In-memory, CIM)電路具有高集成密度
之交錯式記憶體陣列與省電的特性,能改善數位邏輯電路的馮紐曼瓶頸(Von
Neumann Bottleneck),尤其在涉及大量平行運算的仿神經網路,更能實現高能源
效率的應用,顯示其具潛力。
本研究中,提出基於記憶體內運算架構的兩階段訓練網路系統,除了考量硬
體實現上遇到的挑戰外,還評估 CIM 裝置上的非理想性效應對網路效能的影
響,其中包含以下幾點: 首先,設計一個通用於各種網路模型的 CIM 卷積函
式,藉由將卷積核拆分與補償的機制使不同網路模型不會因 CIM 交叉結構的累
加數量無法匹配而無法運算,接著,提出的訓練系統除了能將權重、激活函數
量化至目標位元數之外,還能在訓練網路時注射雜訊提升整體網路在推論階段
(Inference)的強健性,在 CIFAR10 與 CIFAR100 的驗證下,ResNet 能分別改善
2.26%與 8.95%的準確率,此外,由於類比數位轉換器(Analog-to-digital Converter, ADC)位元數的限制,需將 MVM (Matrix-vector Multiplication)輸出值量化,彈性量化區間的 MVM 量化器能根據每層網路輸出值分布的不同,保留
重要的資訊做量化,提升量化品質,在 ResNet 與 VGG 網路實驗驗證下,分別
能提升 4.48%與 5.46%的準確度。實驗結果表明,所提出的方法對於欲製造及
設計用於記憶體內運算的晶片系統是有效且有用的。
Analog computing-in-memory (CIM) involves high-density interleaved memory arrays that are beneficial to deep neural networks involving several parallel computations. Furthermore, it can improve the von Neumann bottleneck of digital logic circuits, and it displays considerable potential in achieving high energy efficiency in artificial intelligence (AI) accelerators. We developed a two-stage training framework that not only considers hardware architecture constraints but also analyzes the nonidealities that occur in CIM devices. First, we designed a CIM convolution function that can be commonly used in various neural networks. Convolution layers cease to be limited by the number of accumulations in CIM architectures if their weight kernels are split and compensated. In addition, the training framework can not only quantize weights and activations to their target bit widths but also inject noise during the training
process to improve the inference robustness of a neural network. Our results revealed that our framework could improve the accuracy of a residual network (ResNet) by 2.26% and 8.95% on CIFAR-10 and CIFAR-100. Moreover, the limitation of analogto-digital converter (ADC) bit widths necessitates the quantization of the output value of matrix–vector multiplication (MVM). An MVM quantizer offers flexible
quantization intervals to the output distribution of each layer such that crucial information can be retained and accuracy can be enhanced after the quantization process. The experimental verification results revealed that the accuracy of the ResNet and visual geometry group (VGG) models increased by 4.48% and 5.46% on CIFAR10, respectively. The results of this study demonstrate that the proposed framework is
effective and useful for the fabrication and design of CIM chip systems.
摘要 i
ABSTRACT ii
目錄 iii
圖目錄 v
表目錄 vii
第 1 章 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 4
1.3 章節簡介 7
第 2 章 文獻回顧 8
2.1 模型壓縮演算法 8
2.2 量化神經網路 9
2.2.1量化方法 10
2.2.2網路單元量化 14
2.3 非揮發性記憶體內運算 16
2.4 MVM量化器 18
2.5 運算單元之非理想效應問題 21
第 3 章 基於記憶體內運算的兩階段神經網路訓練框架 23
3.1 模擬記憶體內運算的卷積運算演算法 23
3.2 基於雜訊注射的兩階段訓練框架 25
3.3 彈性量化區間的MVM量化器 28
第 4 章 實驗結果 33
4.1 實驗設置 33
4.1.1 實驗數據集及前處理 33
4.1.2 網路架構及超參數設置 34
4.1.3 軟硬體環境 39
4.2 量化訓練與CIM訓練演算法結果比較 39
4.3 MVM訊號量化 45
4.4 不同方法結果比較 47
4.4.1 CIM非理想性效應 47
4.4.2比較不同位元數的ADC 49
第 5 章 結論與未來發展 51
參考文獻 52
[1] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature,
521(7553): 436–444, 2015.
[2] Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective
approaches to attention-based neural machine translation. In arXiv, 2015.
[3] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. Imagenet classification
with deep convolutional neural networks. In NIPS, 2012.
[4] Volodymyr Mnih, et al. Human-level control through deep reinforcement learning.
Nature, 518(7540): 529, 2015.
[5] Olga Russakovsky, et al. Imagenet large scale visual recognition challenge.
International Journal of Computer Vision, 115.3: 211-252, 2015.
[6] A. Krizhevsky, and et al., “Imagenet classification with deep convolutional neural
networks.” In NIPS, 2012.
[7] S. Han, and et al., “Learning both Weights and Connections for Efficient Neural
Networks.” In NIPS,2015.
[8] W. Wen, and et al., “Learning Structured Sparsity in Deep Neural Network”. In
NIPS, 2016.
[9] J. Luo, and et al., “ThiNet-A Filter Level Pruning Method for Deep Neural
Network Compression.” In ICCV, 2017.
[10] M. Courbariaux, and et al., “Binaryconnect: Training deep neural networks with
binary weights during propagations.” In NIPS, 2015.
[11] S. Zhou, and et al., “Dorefa-net: Training low bitwidth convolutional neural
networks with low bitwidth gradients.” In arXiv:1606.06160, 2016.
[12] Z. Cai, and et al., “Deep learning with low precision by half-wave gaussian
quantization.” In CVPR, 2017.
[13] X. Lin, and et al., “Towards Accurate Binary Convolutional Neural Network.” In
NIPS, 2017.
[14] D. Miyashita, and et al., “Convolutional neural networks using logarithmic data
representation.” In arXiv, 2016.
[15] A. Zhou, and et al. “Incremental network quantization: Towards lossless cnns with
low-precision weights.” In ICLR, 2017.
[16] M. Rastegar, and et al. “Xnor-net: Imagenet classification using binary
convolutional neural networks.” In ECCV, 2016.
[17] Y. Dong, and et al. “Learning Accurate Low-Bit Deep Neural Networks with
Stochastic Quantization.” In BMVC, 2017.
[18] F. Li and B. Liu. “Ternary weight networks.” In NIPS Workshop on EMDNN,
2016.
[19] S. K. Esser, and et al.”Learned Step Size Quantization.”In ICLR, 2020.
[20] X. Zhao, and et al. “Linear Symmetric Quantization of Neural Networks For Lowprecision Integer Hardware.” In ICLR, 2020.
[21] Boris Murmann. 2020. Mixed-signal computing for deep neural network inference.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems 29, 1 (2020),
3ś13.
[22] G. Hinton, O. Vinyals, J. Dean, Distilling the Knowledge in a Neural Network,
2015.
[23] M. Jaderberg, and et al., “Speeding up convolutional neural networks with low
rank expansions.” In arXiv, 2014.
[24] X. Zhang, and et al., “Accelerating very deep convolutional networks for
classification and detection.” In TPAMI, 38(10):1943-1955, 2015.
[25] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556, 2014.
[26] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V.
Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9,
2015
[27] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.
arXiv preprint arXiv:1512.03385, 2015.
[28] Mark Sandler, and et al., “MobileNetV2: Inverted Residuals and Linear
Bottlenecks.” In CVPR, 2018.
[29] S. Jeloka, N. B. Akesh, D. Sylvester, and D. Blaauw, “A 28 nm configurable
memory (TCAM/BCAM/SRAM) using push-rule 6T bit cell enabling logic-inmemory,” IEEE Journal of Solid-State Circuits, vol. 51, no. 4, pp. 1009–1021,
2016.
[30] A. Subramaniyan, J. Wang, E. R. Balasubramanian, D. Blaauw, D. Sylvester, and
R. Das, “Cache automaton,” in International Symposium on Microarchitecture,
2017, pp. 259–272
[31] J. Zhang, Z. Wang, and N. Verma, “In-memory computation of a machine-learning
classifier in a standard 6T SRAM array,” IEEE Journal of Solid-State Circuits, vol.
52, no. 4, pp. 915–924, 2017.
[32] S. Mittal, R. Wang, and J. Vetter, “DESTINY: A Comprehensive Tool with 3D and
Multi-level Cell Memory Modeling Capability,” Journal of Low Power
Electronics and Applications, vol. 7, no. 3, p. 23, 2017.
[33] J. P. Kulkarni, J. Keane, K.-H. Koo, S. Nalam, Z. Guo, E. Karl, and K. Zhang, “5.6
Mb/mm2 1R1W 8T SRAM Arrays Operating Down to 560 mV Utilizing SmallSignal Sensing With Charge Shared Bitline and Asymmetric Sense Amplifier in
14 nm FinFET CMOS Technology,” IEEE Journal of Solid-State Circuits, vol. 52,
no. 1, pp. 229–239, 2016.
[34] M. Qazi, K. Stawiasz, L. Chang, and A. P. Chandrakasan, “A 512kb 8T SRAM
macro operating down to 0.57 V with an AC-coupled sense amplifier and
embedded data-retention-voltage sensor in 45 nm SOI CMOS,” IEEE Journal of
Solid-State Circuits, vol. 46, no. 1, pp. 85–96, 2010.
[35] J. Kulkarni, M. Khellah, J. Tschanz, B. Geuskens, R. Jain, S. Kim, and V. De,
“Dual-V CC 8T-bitcell SRAM array in 22nm tri-gate CMOS for energy-efficient
operation across wide dynamic voltage range,” in 2013 Symposium on VLSI
Technology. IEEE, 2013, pp. C126–C127.
[36] Y. Zhang, L. Xu, K. Yang, Q. Dong, S. Jeloka, D. Blaauw, and D. Sylvester,
“Recryptor: A reconfigurable in-memory cryptographic Cortex-M0 processor for
IoT,” in 2017 Symposium on VLSI Circuits. IEEE, 2017, pp. C264–C265.
[37] S. Mittal, “A Survey of ReRAM-based Architectures for Processing-in-memory
and Neural Networks,” Machine learning and knowledge extraction, vol. 1, p. 5,
2018.
[38] B. Feinberg, U. K. R. Vengalam, N. Whitehair, S. Wang, and E. Ipek, “Enabling
scientific computing on memristive accelerators,” in 2018 ACM/IEEE 45th
Annual International Symposium on Computer Architecture (ISCA). IEEE, 2018,
pp. 367–382.
[39] D. Fujiki, S. Mahlke, and R. Das, “Duality cache for data parallel acceleration,” in
Proceedings of the 46th International Symposium on Computer Architecture, 2019,
pp. 397–410
[40] Xin Si; Yung-Ning Tu; Wei-Hsing Huang; Jian-Wei Su; Pei-Jung Lu; Jing-Hong
Wang, et al. “A 28nm 64Kb 6T SRAM Computing-in-Memory macro with 8b
MAC operation for AI edge chips”. In ISSCC, 2020. In press.
[41] Yi Cai; Tianqi Tang: Lixue Xia; Boxun Li; Yu Wang, et al. “Low Bit-width
Convolutional Neural Network on RRAM.” In TCAD, 2020.
[42] Vinay Joshi, et al.”Accurate Deep Neural Network Inference Using Computational
Phase-change Memory.” In Nature Communications, 2020.
[43] P. Adam, and et al., “PyTorch: An Imperative Style, High-Performance Deep
Learning Library”, 2019.
[44] Martín Abadi, et al. TensorFlow: Large-scale machine learning on heterogeneous
systems, Software available from tensorflow.org., 2015.
[45] Yangqing Jia, and et al. “Caffe: Convolutional Architecture for Fast Feature
Embedding.” In CVPR, 2014
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *