帳號:guest(18.221.40.170)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):林暐恩
作者(外文):Lin, Wei-En
論文名稱(中文):應用於二進制人工智慧之深層神經網路邊緣處理器以非揮發性電阻式記憶體為基礎進行內存乘法與累加運算
論文名稱(外文):A Non-volatile ReRAM Based Macro with Computing-In-Memory Multiply-and-Accumulate for Binary DNN AI Edge Processors
指導教授(中文):張孟凡
指導教授(外文):Chang, Meng-Fan
口試委員(中文):洪浩喬
呂仁碩
口試委員(外文):Hong, Hao-Chiao
LIU, REN-SHUO
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:105061615
出版年(民國):107
畢業學年度:107
語文別:英文
論文頁數:80
中文關鍵詞:非揮發性記憶體內存計算深度神經網路感測放大器
外文關鍵詞:nv-CIMDeep-Learn-Neuron-NetworkSensing-Amplifier
相關次數:
  • 推薦推薦:0
  • 點閱點閱:273
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
隨著人工智慧時代來臨,深度神經網絡(DNN)已經證明對各種應用有效,例如圖像處理,視頻分割和語音識別。在當前系統上運行現有的DNN主要依賴於通用處理器,ASIC設計或FPGA加速器,由於有限的片上存儲器和數據傳輸帶寬,所有這些都受到數據移動的影響。因此傳統馮‧紐曼計算機體系結構中的“存儲器牆”,處理器和存儲器之間的有限帶寬已成為提高系統性能的最關鍵瓶頸之一。記憶體內運算主要的設計組成概念包括:將資料存放在記憶體中以加快處理的速度、透過壓縮技術減少資料量、減少資料的移動,僅搬移運算後的結果,而非搬移資料去運算、利用內存計算,提高處理效率。
本篇論文提出了一個以非揮發性電阻式記憶體 (ReRAM)為基礎的記憶體內運算功能,在此架構可以接收輸入訊號與儲存權重的能力,讓一個周期內實現乘法與累加運算功能,意即可以減少資料在中央處裡單元(CPU) 與記憶體之間傳輸量以及傳輸過程中的能量損耗。論文中將探討揮發性記憶體內運算(nv-CIM)運算過後讀取上會面臨之挑戰,並提出兩個解決方案(1)輸入模式感知補償生成方案 (2)電壓感測放大器電路解決以下問題,以提升讀取效能:
1.在有限的R比率下,不同的輸入導致相同的乘法和累加值(MACV)的電流或電壓的顯著飄移變化,甚至相鄰的MACV重疊,導致讀取失敗。利用輸入模式感知補償生成方案,可以避免每個MAC value重疊現象發生。
2.在低阻值率下,感測放大器使用雙參考電壓取代傳統中點參考電壓,可以抑制產生參考電壓的製程變異後的分布,並且增強原本偏小的感測裕度。
透過55奈米邏輯製程,本篇論文實現了容量為0.5Mb之基於ReRAM的記憶體內具有運算功能的架構。與先前的論文相比,在記憶體內運算模式中,所提出來的架構降低了將近3X的能量損耗。而量測結果,在記憶體內運算模式下,量測讀取速度達到了9.7ns的,在ReRAM模式下,量測讀取速度達到2.5ns。我們提出的感測放大器,相較於傳統電壓感測放大器,可以容忍7.4倍之輸入飄移能力。
With the advent of artificial intelligence, Deep neural networks (DNN) have demonstrated effectiveness for various applications such as image processing, video segmentation, and speech recognition. Running state-of-theart DNNs on current systems mostly relies on either general purpose processors, ASIC designs, or FPGA accelerators, all of which suffer from data movements due to the limited onchip memory and data transfer bandwidth. Therefore, the “memory wall” in conventional Von-Neumann computer architectures, the limited bandwidth between processors and memories has become one of the most critical bottlenecks to improve system performance. The main design components of computing in memory include: storing data in memory to speed up processing, reducing data volume through compression technology, reducing data movement, and only moving the results after calculation, instead of moving data to calculate and utilize computing in memory for improved processing efficiency.
This paper proposes a computing in memory function based on non-volatile resistive memory (ReRAM), in which the architecture can receive input signals and store weights, enabling multiplication and accumulation operations in one cycle. This means that the amount of data transferred between the CPU and the memory and the energy loss in the process can be reduced. In this paper, we will discuss the challenges of reading in non-volatile computing in memory (nv-CIM) and propose two solutions. (1) Input-pattern aware compensate generation scheme (2) Voltage sense amplifier.
The circuits solve the following problems to improve read performance:
1. At a limited R ratio, different inputs result in significant shifts in current or voltage at the same MAC value, even overlapping adjacent MAC values, resulting in reading failure. The use of the input-pattern aware compensate generation scheme can avoid the occurrence of each MAC value overlap phenomenon.
2. At small R-ratio, the voltage sense amplifier replaces the traditional midpoint reference voltage with a dual reference voltage, which suppresses the process variations of reference voltage distribution and enhances the original small sense margin.
Through the 55nm logic process, this paper implements an architecture with computational capabilities in a 0.5Mb ReRAM-based memory. Compared with the previous papers, computing in-memory mode, this paper reduces the energy loss of nearly 3X. The measurement results, in the nv-CIM mode, this paper reached the measurement read speed of 9.7ns, in the ReRAM mode, it reached the measurement read speed of 2.5ns. Our proposed sense amplifier can tolerate 7.4 times the input offset capability compared to conventional voltage sense amplifiers.
摘要 i
Abstract iii
List of Tables x
Chapter 1 Introduction 1
1.1 The Role of Memory in SoC products 1
1.2 Memory Landscape 3
1.2.1 RAM 5
1.2.2 CAM 5
1.2.3 ROM 6
1.2.4 Programmable NVMs 6
1.3 Von Neumann bottleneck 7
1.4 Schematic of the Deep Neural Network (DNN) 9
1.5 Computing-In-Memory (CIM) 10
Chapter 2 Characteristic of Contact-ReRAM 11
2.1 Structure of Contact-ReRAM 11
2.2 Write Operation 13
2.3 Read Operation 15
2.4 Distribution of Contact-ReRAM 16
Chapter 3 17
3.1 Structure and Operations of nv-CIM Read 17
3.2 Design Challenges 18
3.3 Previous Arts 19
3.3.1 Input Aware Reference Based ReRAM CIM 20
3.3.2 Neural-Network(NN) based on Resistive Analog Neuro Device 23
3.3.3 Summary 25
Chapter 4 25
4.1.1 Motivation and Concept of Proposed ReRAM based CIM 26
4.1.2 Input-pattern aware compensate generation scheme 27
4.1.3 Operation 28
4.2 Analysis and Comparison for nv-CIM Scheme 31
4.2.1 Analysis Sensing margin of scheme 31
4.2.2 WLs on quantity versus to power consumption and Devolving time 32
4.2.3 Speed Comparison for nv-CIM scheme 32
4.2.4 Power Comparison for nv-CIM scheme 33
4.2.5 Energy Comparison for nv-CIM scheme 34
4.2.1 Analysis Read Yield of Macro 35
4.3 Structure and operations of proposed read I/O 36
4.4 Analysis and Comparison for nv-CIM Macro 38
4.4.1 Speed Comparison for nv-CIM Macro 39
4.4.2 Energy Comparison for nv-CIM Macro 40
Chapter 5 42
5.1 Design Challenges of Small Sensing Margin 42
5.1.1 Threshold Voltage in Process 42
5.1.2 Issues of Small R-ratio Devices 43
5.2 Structure and Operation of Conventional Sensing Schemes 45
Chapter 6 Proposed SA Circuits Schemes and Analysis 48
6.1 Concept of Proposed Sense Amplifier 48
6.2 Structure of Proposed Sense Amplifier 53
6.3 Operations of Proposed Sense Amplifier 54
6.4 Analysis and Comparison for SA 60
6.4.1 Efficiency of Margin Enhancement 60
Chapter 7 Measurement Results and Conclusion 64
7.1 ReRAM Macro 64
7.2 Design for Testchip 67
7.3 Conclusions and Future Work 75
Reference 77
[1] K. Itoh et al., “VLSI Memory Chip Design”, Springer-Verlag, pp. 1-46, 2001.
[2] ITRS, “2001 Technology Roadmap For Semiconductors,” IEEE Computer, vol. 35, issue 1, pp. 42–53, Jan. 2002.
[3] F. Menichelli et al., “Static Minimization of Total Energy Consumption in Memory Subsystem for Scratchpad-Based Systems-on-Chips,” IEEE Transactions on Very Large Scale Integration Systems, vol. 17, issue 2, pp. 161-171, Jan. 2009.
[4] M. Kang et al., “Energy-efficient and high throughput sparse distributed memory architecture,” IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2505-2508, 2015.
[5] S. Jeloka et al., “A 28 nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory,” IEEE Journal of Solid-State Circuits (JSSC), vol. 51, no. 4, pp. 1009-1021, Apr. 2016.
[6] A. G. Hanlon et al., “Content-Addressable and Associative Memory Systems a Survey,” IEEE Transactions on Electronic Computers, vol. EC-15, no.4, pp.509-521, Aug. 1966.
[7] C. C. Wang et al., “An Adaptively Dividable Dual-Port BiTCAM for Virus-Detection Processors in Mobile Devices,” IEEE International Solid-State Circuits Conference (ISSCC), pp.390-622, Feb. 2008.
[8] J. Li et al., “1 Mb 0.41 µm² 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing,” IEEE Journal of Solid-State Circuits (JSSC), vol. 49, Issue 4, pp. 896-907, Apr. 2014.
[9] M. F. Chang et al., “A 3T1R Nonvolatile TCAM Using MLC ReRAM with Sub-1ns Search Time,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 1-3, Feb. 2015.
[10] D. Smith et al., “A 3.6ns 1Kb ECL I/O BiCMOS U.V. EPROM,” IEEE International Symposium on Circuits and Systems (ISCAS), vol. 3, pp. 1987-1990, May. 1990.
[11] C. Kuo et al., “A 512-kb flash EEPROM embedded in a 32-b microcontroller,” IEEE Journal of Solid-State Circuits (JSSC), vol. 27, Issue 4, pp. 574-582, Apr. 1992.
[12] S. H. Kulkarni et al., “A 4 kb Metal-Fuse OTP-ROM Macro Featuring a 2 V Programmable 1.37 μm2 1T1R Bit Cell in 32 nm High-k Metal-Gate CMOS,” IEEE Journal of Solid-State Circuits (JSSC), vol. 45, Issue 4, pp. 863-868, Apr. 2010.
[13] Y. H. Tsai et al., “45nm Gateless Anti-Fuse Cell with CMOS Fully Compatible Process,” IEEE International Electron Devices Meeting (IEDM), pp. 95-98, Dec. 2007.
[14] Webfeet Inc., “Semiconductor industry outlook,” Non-Volatile Memory Conference, 2002.
[15] S. L. Min et al., “Current trends in flash memory technology,” IEEE Asia and South Pacific Conference on Design Automation, pp. 24-27, Jan. 2006.
[16] F. Masuoka et al., “New ultra high density EPROM and flash EEPROM with NAND structure cell,” IEEE International Electron Devices Meeting (IEDM), vol. 33, pp. 552-555, 1987.
[17] A. Bergemont et al., “NOR virtual ground (NVG)-a new scaling concept for very high density flash EEPROM and its implementation in a 0.5 um process,” IEEE International Electron Devices Meeting (IEDM), pp. 15-18, Dec. 1993.
[18] D. Kuzum et al., “Nanoelectronic programmable synapses based on phase change materials for brain-inspired computing,” Nano Letters 12 (5), 2179-2186, 2012.
[19] B. Chen et al., “Efficient in-memory computing architecture based on crossbar arrays,” IEEE International Electron Devices Meeting (IEDM), pp. 17.5.1-17.5.4, 2015.
[20] S. Li et al., “Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories,” ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1-6, 2016.
[21] Q. Dong et al., “A 0.3V VDDmin 4+2T SRAM for searching and in-memory computing using 55nm DDC technology,” IEEE Symposium on VLSI Circuits, pp. C160-C161, 2017.
[22] F. Su et al., “A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory,” IEEE Symposium on VLSI Circuits, pp. C260-C261, 2017.
[23] Y. H. Tseng et al., “High density and ultra small cell size of Contact ReRAM (CR-RAM) in 90nm CMOS logic technology and circuits,” IEEE International Electron Devices Meeting (IEDM), pp. 1-4, Dec. 2009.
[24] C. H. Ho et al., “A Highly Reliable Self-Aligned Graded Oxide WOx Resistance Memory: Conduction Mechanisms and Reliability,” IEEE Symposium on VLSI Technology, pp. 228-229, Jun. 2007.
[25] M. J. Lee et al., “2-stack 1D-1R Cross-point Structure with Oxide Diodes as Switch Elements for High Density Resistance RAM Applications,” IEEE International Electron Devices Meeting (IEDM), pp. 771-774, Dec. 2007.
[26] H. Y. Lee et al., “Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM,” IEEE International Electron Devices Meeting (IEDM), pp. 1-4, Dec. 2008.
[27] B. Gao et al., “Oxide-based RRAM switching mechanism: A new ion-transport-recombination model,” IEEE International Electron Devices Meeting (IEDM), pp. 1-4, Dec. 2008.
[28] C. H. Wang et al., “Three-dimensional 4F2 ReRAM cell with CMOS logic compatible process,” IEEE International Electron Devices Meeting (IEDM), pp. 29.6.1-29.6.4, Dec. 2010.
[29] Y. S. Chen et al., “Highly scalable hafnium oxide memory with improvements of resistive distribution and read disturb immunity,” IEEE International Electron Devices Meeting (IEDM), pp. 1-4, Dec. 2009.
[30] G. Bersuker et al., “Metal oxide RRAM switching mechanism based on conductive filament microscopic properties,” IEEE International Electron Devices Meeting (IEDM), pp. 19.6.1-19.6.4, Dec. 2010.
[31] C. Cagli et al., “Evidence for threshold switching in the set process of NiO-based RRAM and physical modeling for set, reset, retention and disturb prediction,” IEEE International Electron Devices Meeting (IEDM), pp. 1-4, Dec. 2008.
[32] J. Lee et al., “Diode-less nano-scale ZrOx/HfOx RRAM device with excellent switching uniformity and reliability for high-density cross-point memory applications,” IEEE International Electron Devices Meeting (IEDM), pp. 19.5.1-19.5.4, Dec. 2010.
[33] B. Lee et al., “NiO resistance change memory with a novel structure for 3D integration and improved confinement of conduction path,” IEEE Symposium on VLSI Technology, pp. 28-29, Jun. 2009.
[34] K. Aratani et al., “A Novel Resistance Memory with High Scalability and Nanosecond Switching,” IEEE International Electron Devices Meeting (IEDM), pp. 10-12, Dec. 2007.
[35] M. F. Chang et al., “An Offset-Tolerant Fast-Random-Read Current-Sampling-Based Sense Amplifier for Small-Cell-Current Nonvolatile Memory,” IEEE Journal of Solid-State Circuits (JSSC), vol. 48, no. 3, pp. 864-877, Mar. 2013.
[36] M. F. Chang et al., “A 28nm 256kb 6T-SRAM with 280mV improvement in VMIN using a dual-split-control assist scheme,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 314-315, 2015.
[37] H. Noguchi et al., "Novel voltage controlled MRAM (VCM) with fast read/write circuits for ultra large last level cache," 2016 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2016, pp. 27.5.1-27.5.4.
[38] Y. J. Song et al., "Highly functional and reliable 8Mb STT-MRAM embedded in 28nm logic," 2016 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2016, pp. 27.2.1-27.2.4.
[39] J. M. Slaughter et al., "Technology for reliable spin-torque MRAM products," 2016 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2016, pp. 21.5.1-21.5.4.
[40] S. Song et al., "CMOS device scaling beyond 100 nm," International Electron Devices Meeting 2000. Technical Digest. IEDM (Cat. No.00CH37138), San Francisco, CA, USA, 2000, pp. 235-238.
[41] Jean-Pierre Colinge, Cynthia A. Colinge, “Physics of Semiconductior Devices.” Springer-Verlag, NY, pp. 175-182, 2002.
[42] E. Morifuji et al., "A 1.5 V high performance mixed signal integration with indium channel for 130 nm technology node," International Electron Devices Meeting 2000. Technical Digest. IEDM (Cat. No.00CH37138), San Francisco, CA, USA, 2000, pp. 459-462.
[43] C. H. Shih, Y. M. Chen and C. Lien, "Effect of insulated shallow extension for the improved short-channel effect of sub-100 nm MOSFET," International Semiconductor Device Research Symposium, pp. 158-159, Dec. 2003.
[44] S. Severi et al., "Diffusion-less junctions and super halo profiles for PMOS transistors formed by SPER and FUSI gate in 45 nm physical gate length devices," IEDM Technical Digest. IEEE International Electron Devices Meeting, 2004., 2004, pp. 99-102.
[45] M. F. Chang et al., "An offset-tolerant current-sampling-based sense amplifier for Sub-100nA-cell-current nonvolatile memory," 2011 IEEE International Solid-State Circuits Conference, San Francisco, CA, 2011, pp. 206-208.
[46] M. Jefremow et al., "Time-differential sense amplifier for sub-80mV bitline voltage embedded STT-MRAM in 40nm CMOS," 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, 2013, pp. 216-217.
[47] N. Verma and A. P. Chandrakasan, "A High-Density 45nm SRAM Using Small-Signal Non-Strobed Regenerative Sensing," 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, San Francisco, CA, 2008, pp. 380-621.
[48] C. C. Lin et al., "7.4 A 256b-wordlength ReRAM-based TCAM with 1ns search-time and 14?? improvement in word length-energy efficiency-density product using 2.5T1R cell," 2016 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 136-137.
[49] M. F. Chang et al., "19.4 embedded 1Mb ReRAM in 28nm CMOS with 0.27-to-1V read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme," 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, 2014, pp. 332-333.
[50] B. Giridhar, N. Pinckney, D. Sylvester and D. Blaauw, "13.7 A reconfigurable sense amplifier with auto-zero calibration and pre-amplification in 28nm CMOS," 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, 2014, pp. 242-243.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *