帳號:guest(216.73.216.146)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):劉大維
作者(外文):Liu, Ta-Wei
論文名稱(中文):應用於非揮發記憶體內運算之高阻態單元偏好神經網路權重量化演算法
論文名稱(外文):High-Resistance-State-Favored Quantization Algorithm Based on Non-Volatile Computing In Memory for Neural Network
指導教授(中文):張孟凡
指導教授(外文):Chang, Meng-Fan
口試委員(中文):呂仁碩
邱瀝毅
口試委員(外文):Liu, Ren-Shuo
Chiou, Lih-Yih
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:107061559
出版年(民國):109
畢業學年度:109
語文別:英文
論文頁數:43
中文關鍵詞:非揮發記憶體內運算神經網路權重量化人工智能友善量化降低功耗
外文關鍵詞:Non-Volatile Computing In MemoryNeural NetworkQuantization Algorithmartificial intelligencefriendly quantizationlow power
相關次數:
  • 推薦推薦:0
  • 點閱點閱:1249
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
隨著人工智慧的浪潮愈演愈烈,深度學習演算法也愈趨成熟,因此神經
網路(NN) 硬體加速器的研究也如雨後春筍般出現,而記憶體內運算便是其
中一種打破傳統逢紐曼架構的記憶體導向特殊硬體。記憶體內運算的優勢
是同時具有運算以及儲存的功能,在記憶體端先進行運算再傳出結果,減
少移動的資料量,節省所需時間與能源。本篇論文針對一篇已經發表過的
非揮發記憶體內運算(nv-CIM) 作為實驗平台,其電路有著一個可以增進讀
取良率的電路(in-situ HRS-C),而本篇論文利用了其電路的附帶特性,設計
一種特殊的神經網路權重量化流程,這個流程可以被有效加入到網路中,
使得在使用這個硬體進行AI 辨識時,在付出些微的精準度下降後,能使
TOPS/W 得到4.6 到10.51 的提升,同時也在網路訓練的過程中加入了CIM
的行為模型,這個行為模型能模擬感測放大器的行為,使訓練出來的網路
能包容因為CIM 中的類比運算誤差。進一步達到與純軟體接近的準確度。
With the rapid development of artificial intelligence and a more mature deep
learning algorithm, as a result, the number of DNN accelerator research has sprung
up, and In-Memory Computing is one of them, which is used to overcome the Von
Neumann bottleneck. The advantage of in-memory computing is that it has both
computation and storage functions. It performs operations on the memory side and
then transmits the results, reducing the amount of data moved, saving time and
energy. This paper aims at a published non-volatile memory in-memory computing
(nvCIM) as an experimental platform. It has a unique circuit (in-situ HRS-C)
that can improve the read yield. However, this thesis applies to this circuit’s additional
characteristics and provides a neural network parameter training process.
This training flow can be effectively added to the neural network. When this hardware
is used for AI classification, the proposed training flow makes it possible to
achieve a reasonable tradeoff between energy efficiency and inference accuracy.
It can improve TOPS/W from 4.6 to 10.51 with a negligible inference accuracy
drop. At the same time, the CIM behavior model also applies to the network training
process. This behavior model can simulate the sensing amplifier’s behavior.
The trained network can accommodate the CIM’s analog error and further achieve
accuracy close to pure software.
摘要 i
Abstract ii
致謝 iii
1 Introduction 1
1.1 The Development of Artificial Intelligence 1
1.1.1 Artificial Intelligent(AI) 2
1.1.2 Machine Learning(ML) 2
1.1.3 Deep Learning(DL) 3
1.2 Train a Neural Network 6
1.3 A Specialized Hardware - Artificial Intelligence Accelerator 7
1.4 Another Approach to AI Accelerator - Computing In Memory 8
2 Back Ground 10
2.1 Introduction for Emerging NVM Devices 10
2.2 Introduction for ReRAM CIM 12
2.2.1 Structure of ReRAM CIM 12
2.2.2 CIM Operation of ReRAM CIM 13
2.2.3 ReRAM CIM Mapping for AI Application 15
2.3 NN Model Building For ReRAM CIM 17
3 Previous Work 19
3.1 Previous ReRAM CIM Work 19
3.1.1 ReRAM CIM Structure-1 19
3.1.2 ReRAM CIM Structure-2 21
3.2 Previous Hardware Friendly Algorithm 23
3.2.1 Deep Compression 23
3.2.2 DoReFa Quantization Method 26
4 Proposed Algorithm and Analysis 29
4.1 Motivation of Proposed Algorithm 29
4.2 Concept of Proposed Algorithm 29
4.3 Details of Proposed Algorithm 31
4.3.1 Forward Propagation in HRS-FQ 33
4.3.2 Backward Propagation in HRS-FQ 35
4.4 Analysis and Comparison of the Proposed Algorithm 35
5 Conclusion and Future Work 38
References 39
[1] J. K. Gill, “Automatic log analysis using deep learning and ai.,” 2018
(Oct 21,2018). https://www.xenonstack.com/blog/log-analytics-deep-machinelearning/?
fbclid=IwAR2dOLlgJG6IH0MzlL-.
[2] S. Kishi, “Simple regression model by tensorflow.,” 2017 (Sep 24,2017). https://marubonds.
blogspot.com/2017/09/simple-regression-model-by-tensorflow.html.
[3] S. Saha, “A comprehensive guide to convolutional neural networks —the eli5
way.,” 2018 (Dec 16,2018). https://towardsdatascience.com/a-comprehensive-guide-toconvolutional-
neural-networks-the-eli5-way-3bd2b1164a53.
[4] Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efficient
dataflow for convolutional neural networks,” IEEE Micro, vol. PP, pp. 1–1, 06 2017.
[5] C.-X. Xue, T.-Y. Huang, J.-S. Liu, H.-Y. Kao, J.-H. Wang, T.-W. Liu, S.-Y. Wei, S.-P.
Huang, W.-C. Wei, Y.-R. Chen, T.-H. Hsu, Y.-K. Chen, Y.-C. Lo, T.-H. Wen, C.-C. Lo,
R.-S. Liu, C.-C. Hsieh, K.-T. Tang, and M.-F. Chang, “15.4 a 22nm 2mb reram computein-
memory macro with 121-28tops/w for multibit mac computing for tiny ai edge devices,”
10.1109/ISSCC19947.2020.9063078, pp. 244–246, 02 2020.
[6] W.-H. Chen, K.-X. Li, W.-Y. Lin, K.-H. Hsu, P.-Y. Li, C.-H. Yang, C.-X. Xue, E.-
Y. Yang, Y.-K. Chen, Y.-S. Chang, T.-H. Hsu, F. Chen, C.-J. Lin, R.-S. Liu, C.-C.
Hsieh, K.-T. Tang, and M.-F. Chang, “A 65nm 1mb nonvolatile computing-in-memory
reram macro with sub-16ns multiply-and-accumulate for binary dnn ai edge processors,”
10.1109/ISSCC.2018.8310400, pp. 494–496, 02 2018.
[7] C.-X. Xue, W.-H. Chen, J.-S. Liu, J.-F. Li, W.-Y. Lin, W.-E. Lin, J.-H. Wang, W.-C.
Wei, T.-C. Chang, T.-Y. Huang, H.-Y. Kao, S.-Y. Wei, Y.-C. Chiu, C.-Y. Lee, C.-C. Lo,
F. Chen, C.-J. Lin, R.-S. Liu, and M.-F. Chang, “24.1 a 1mb multibit reram computing-inmemory
macro with 14.6ns parallel mac computing time for cnn based ai edge processors,”
10.1109/ISSCC.2019.8662395, pp. 388–390, 02 2019.
[8] S. Han, H. Mao, and W. Dally, “Deep compression: Compressing deep neural network
with pruning, trained quantization and huffman coding,” CoRR, vol. abs/1510.00149,
2016.
[9] E. Touger, “What’s the difference between artificial intelligence (ai), machine learning,
and deep learning?,” 2018 (Aug 3, 2018). https://www.prowesscorp.com/whats-thedifference-
between-artificial-intelligence-ai-machine-learning-and-deep-learning.
[10] B. Y. . H. G. LeCun, Y., “Deep learning.,” Nature, vol. 521, pp. 436–444, 2015.
[11] e. a. Silver, D., “Mastering the game of go with deep neural networks and tree search.,”
Nature, vol. 529, p. 484–489, 2016a.
[12] M. A. Nielsen, “Neural networks and deep learning, determination press.,” 2015.
[13] H. Leung and S. Haykin., “The complex backpropagation algorithm.,” IEEE Transactions
on Signal Processing, vol. 39, no. 9, pp. 2101–2104, Sept. 1991.
[14] S. B. M. R. Thomas Serre, Lior Wolf and I. Tomaso Poggio, Member, “Robust object
recognition with cortex-like mechanisms,” IEEE TRANSACTIONS ON PATTERN
ANALYSIS AND MACHINE INTELLIGENCE, vol. 29(3), pp. 411–26, 2007.
[15] Y. L. L. B. Y. Bengio and P. Haner., “Gradientbased learning applied to document recognition,”
Processing of IEEE, vol. 86(11), pp. 2278–324, 1998.
[16] G. E. D. T. N. S. G. E. Hinton, “Improving deep neural networks for lvcsr using rectified
linear units and dropout,” Acoustics,Speech and Signal Processing(ICASSP), vol. 2013,
pp. 8609–13, 2013.
[17] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization
in the brain,” Psychological Review, pp. 65–386, 1958.
[18] D. H. Hubel and T. N. Wiesel, “Receptive fields and functional architecture of monkey
striate cortex.,” The Journal of physiology, vol. 195 1, pp. 215–43, 1968.
[19] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional
neural networks,” Neural Information Processing Systems, vol. 25, 01 2012.
[20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
pp. 770–778, 06 2016.
[21] A. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto,
and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision
applications,” 04 2017.
[22] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” Proceedings
of the 14th International Conference on Artificial Intelligence and Statisitics (AISTATS)
2011, vol. 15, pp. 315–323, 01 2011.
[23] Y. Wu, H. Zhao, and L. Zhang, “Image denoising with rectified linear units,” 10.1007/978-
3-319-12643-218, vol. 8836, pp. 142 − −149, 112014.
[24] Y. Lecun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel,
“Backpropagation applied to handwritten zip code recognition,” Neural Computation,
vol. 1, pp. 541–551, 12 1989.
[25] S. Khan, “Ethem alpaydin. introduction to machine learning (adaptive computation and
machine learning series),” Natural Language Engineering, vol. 14, pp. 133–137, 01 2008.
[26] N. Jouppi, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley,
M. Dau, J. Dean, B. Gelb, C. Young, T. Ghaemmaghami, R. Gottipati, W. Gulland,
R. Hagmann, C. Ho, D. Hogberg, J. Hu, and N. Boden, “In-datacenter performance analysis
of a tensor processing unit,” 10.1145/3079856.3080246, pp. 1–12, 06 2017.
[27] J. Lee, J. Lee, D. Han, J. Lee, G. Park, and H.-J. Yoo, “7.7 lnpu: A 25.3tflops/w sparse
deep-neural-network learning processor with fine-grained mixed precision of fp8-fp16,”
10.1109/ISSCC.2019.8662302, pp. 142–144, 02 2019.
[28] V. Sze, Y.-H. Chen, T.-J. Yang, and J. Emer, “Efficient processing of deep neural networks:
A tutorial and survey,” Proceedings of the IEEE, vol. 105, 03 2017.
[29] A. Biswas and A. Chandrakasan, “Conv-ram: An energy-efficient sram with embedded
convolution computation for low-power cnn-based machine learning applications,”
10.1109/ISSCC.2018.8310397, pp. 488–490, 02 2018.
[30] S. Gonugondla, M. Kang, and N. Shanbhag, “A 42pj/decision 3.12tops/
w robust in-memory machine learning classifier with on-chip training,”
10.1109/ISSCC.2018.8310398, pp. 490–492, 02 2018.
[31] X. Si, Y.-N. Tu, W.-H. Huanq, J.-W. Su, P.-J. Lu, J.-H. Wang, T.-W. Liu, S.-Y. Wu, R. Liu,
Y.-C. Chou, Z. Zhang, S.-H. Sie, W.-C. Wei, Y.-C. Lo, T.-H. Wen, T.-H. Hsu, Y.-K. Chen,
W. Shih, C.-C. Lo, and M.-F. Chang, “15.5 a 28nm 64kb 6t sram computing-in-memory
macro with 8b mac operation for ai edge chips,” 10.1109/ISSCC19947.2020.9062995,
pp. 246–248, 02 2020.
[32] X. Si, J.-J. Chen, Y.-N. Tu, W.-H. Huang, J.-H. Wang, Y.-C. Chiu, W.-C. Wei, S.-Y. Wu,
X. Sun, R. Liu, S. Yu, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, Q. Li, and M.-F. Chang, “24.5
a twin-8t sram computation-in-memory macro for multiple-bit cnn-based machine learning,”
10.1109/ISSCC.2019.8662392, pp. 396–398, 02 2019.
[33] W.-S. Khwa, J.-J. Chen, J.-F. Li, X. Si, E.-Y. Yang, X. Sun, R. Liu, P.-Y. Chen, Q. Li,
S. Yu, and M.-F. Chang, “A 65nm 4kb algorithm-dependent computing-in-memory sram
unit-macro with 2.3ns and 55.8tops/w fully parallel product-sum operation for binary dnn
edge processors,” 10.1109/ISSCC.2018.8310401, pp. 496–498, 02 2018.
[34] W.-H. Chen, C. Dou, K.-X. Li, W.-Y. Lin, P.-Y. Li, J.-H. Huang, J.-H. Wang, W.-C. Wei,
C.-X. Xue, Y.-C. Chiu, F. Chen, C.-J. Lin, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, J. Yang,
M.-S. Ho, and M.-F. Chang, “Cmos-integrated memristive non-volatile computing-inmemory
for ai edge processors,” Nature Electronics, vol. 2, 08 2019.
[35] J. Hung, X. Li, J. Wu, and M. Chang, “Challenges and trends indeveloping nonvolatile
memory-enabled computing chips for intelligent edge devices,” IEEE Transactions on
Electron Devices, vol. 67, no. 4, pp. 1444–1453, 2020.
[36] Y. Liao, H. Wu, W. Wan, W. Zhang, B. Gao, H.-S. P. Wong, and H. Qian, “Novel
in-memory matrix-matrix multiplication with resistive cross-point arrays,” 10.1109/VLSIT.
2018.8510634, pp. 31–32, 06 2018.
[37] M.-F. Chang, C.-C. Kuo, S.-S. Sheu, C.-J. Lin, Y.-C. King, F. Chen, T. Ku, M.-J. Tsai,
J.-J. Wu, and Y.-D. Chih, “Area-efficient embedded resistive ram (reram) macros using
logic-process vertical-parasitic-bjt (vpbjt) switches and read-disturb-free temperatureaware
current-mode read scheme,” Solid-State Circuits, IEEE Journal of, vol. 49, pp. 908–
916, 04 2014.
[38] M.-F. Chang, C.-W. Wu, C.-C. Kuo, S.-J. Shen, S.-M. Yang, K.-F. Lin, W.-C. Shen, Y.-
C. King, C.-J. Lin, and Y.-D. Chih, “A low-voltage bulk-drain-driven read scheme for
sub-0.5 v 4 mb 65 nm logic-process compatible embedded resistive ram (reram) macro,”
Solid-State Circuits, IEEE Journal of, vol. 48, pp. 2250–2259, 09 2013.
[39] J. Wu, Y. Chen, W. Khwa, S. Yu, T. Wang, J. Tseng, Y. Chih, and C. Diaz,
“A 40nm low-power logic compatible phase change memory technology,”
10.1109/IEDM.2018.8614513, pp. 27.6.1–27.6.4, 12 2018.
[40] M. W. W. Zhang, R. Mazzarello and E. Ma, “Designing crystallization in phase-change
materials for universal memory and neuroinspired computing.,” Nature, vol. 4, pp. 107–
108, 01 2019.
[41] A. Patil, H. Hua, S. Gonugondla, M. Kang, and N. Shanbhag, “An mram-based deep inmemory
architecture for deep neural networks,” pp. 1–5, 05 2019.
[42] G. Hu, M. Gottwald, Q. He, J.-H. Park, G. Lauer, J. Nowak, S. Brown, B. Doris,
D. Edelstein, E. Evarts, P. Hashemi, B. Khan, Y. Kim, C. Kothandaraman, N. Marchack,
E. O’Sullivan, M. Reuter, R. Robertazzi, J. Sun, and D. Worledge, “Key parameters affecting
stt-mram switching efficiency and improved device performance of 400°c-compatible
p-mtjs,” 10.1109/IEDM.2017.8268515, pp. 38.3.1–38.3.4, 12 2017.
[43] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado,
A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard,
Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga,
S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar,
P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg,
M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on
heterogeneous systems,” 2015. Software available from tensorflow.org.
[44] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and
T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint
arXiv:1408.5093, 2014.
[45] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison,
L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
[46] T. E. Oliphant, A guide to NumPy, vol. 1. Trelgol Publishing USA, 2006.
[47] Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten digit database,” ATT Labs [Online].
Available: http://yann.lecun.com/exdb/mnist, vol. 2, 2010.
[48] A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research),”
[49] I. Hubara, D. Soudry, and R. El-Yaniv, “Binarized neural networks,” 02 2016.
[50] S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, and Y. Zou, “Dorefa-net: Training low bitwidth
convolutional neural networks with low bitwidth gradients,” CoRR, vol. abs/1606.06160,
2016.
[51] X. Jin and J. Han, K-Means Clustering, pp. 563–564. Boston, MA: Springer US, 2010.
[52] D. Huffman, “A method for the construction of minimum-redundancy codes,” Resonance,
vol. 11, pp. 91–99, 02 2006.
[53] P. Yin, J. Lyu, S. Zhang, S. J. Osher, Y. Qi, and J. Xin, “Understanding straight-through
estimator in training activation quantized neural nets,” CoRR, vol. abs/1903.05662, 2019.
[54] S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint
arXiv:1609.04747, 2016.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *