帳號:guest(3.135.215.71)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):高暉曜
作者(外文):Kao, Hui-Yao.
論文名稱(中文):應用於非揮發性記憶體內運算架構之高速雙位元全電壓值域感測放大器
論文名稱(外文):A High Speed Two-bit Full Voltage Range Sense Amplifier for Non-volatile Computing-In-Memory
指導教授(中文):張孟凡
指導教授(外文):Chang, Meng-Fan
口試委員(中文):呂仁碩
邱瀝毅
口試委員(外文):Liu, Ren-Shuo
Chiou, Lih-Yih
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電子工程研究所
學號:107063537
出版年(民國):109
畢業學年度:109
語文別:英文
論文頁數:53
中文關鍵詞:電壓感測放大器記憶體內運算電阻式記憶體
外文關鍵詞:Voltage Sense AmplifierComputing-In-MemoryReRAM
相關次數:
  • 推薦推薦:0
  • 點閱點閱:701
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
近年來,隨著行動裝置和物聯網的發展比以往更加盛行,對於非揮發性記憶體的要求與日俱增。目前主流的非揮發性記憶體為快閃記憶體(FLASH),其具有成本低、容量大的特性而被大眾廣泛使用。然而,由於快閃記憶體需要高寫入電壓,且在製程微縮上遇到許多問題而陷入了瓶頸,因此開始拓展下世代的非揮發性記憶體(ReRAM, STT-MRAM, ...等)。相比傳統的快閃記憶體,下世代非揮發性記憶體可以使用較低的電壓來寫入、較快的讀取速度、較小的面積、並且具有邏輯製程相容性,這些優點使非揮發性記憶體比快閃記憶體更適合應用於內嵌式裝置。
又隨著深度學習和物聯網的發展,需要計算的資料量隨著神經網路的複雜度而上升,然而,傳統的范紐曼架構(Von Neumann)讓大多數的時間浪費在處理器和記憶體間的資訊搬運,兩者間的帶寬限制造成了運算速度的瓶頸。所以,近年來開始提出記憶體內運算(CIM)來解決這個問題,這使得需要傳輸的資訊是經過計算後的,減少資訊的搬移,進而提升處理效率,並搭配上非揮發性記憶體的特性,使得非揮發性記憶體內運算更適合運用在行動裝置和物聯網。
本碩士論文會探討非揮發性記憶體內運算所面臨之挑戰,並提出一個電壓感測放大器去解決這些問題,主要面臨的挑戰有下面兩個:
1. 隨著網路的複雜度提升,為了提高準確度,多位元的輸入和權重是必須的。然而隨著輸出的位元數上升,非揮發系性記憶體內運算架構需要更長的時間來完成,操作的速度因此下降。
2. 在有限的電壓下,傳統的電壓感測器並不能在低於臨界電壓的部分做正常的操作,因此不同累加值之間的感測裕度降低,讀取的良率也會跟著降低。
因此在此篇論文中提出一個電壓感測放大器,可以在一個操作區間內,產生連續兩位元的輸出,分別是00、01、10、11的值。提出的電壓感測放大器的時間比傳統的電壓感測放大器少48% ~ 52%,且在記憶體內運算巨集的時間比使用傳統的電壓感測放大器快27% ~ 39%。並且提出的電壓感測放大器支援全值域的電壓感測,使得在做非揮發性記憶體內運算時的感測裕度可以放大。同時具有製程變異消除和放大感測裕度的機制來在小的感測裕度也有較高的良率。傳統的電壓感測放大器在不同的共模電壓下能夠容忍的小偏壓電壓量也不同,而提出的電壓感測放大器能夠容忍1.76 ~ 2.91倍的小偏壓電壓量,且在不同的共模電壓下能夠容忍的小偏移電壓量很穩定。
我們以容量為4Mb的電阻式記憶體來實現記憶體內運算,使用台積電22奈米製程。在正常操作電壓0.8V,量測提出的兩位元輸出電壓感測放大器速度為1.36ns而傳統的電壓感測放大器為1.24ns。應用在記憶體內運算架構八位元輸入和八位元權重的速度,輸出八位元可以達到14.8奈秒。
In recent years, with the development of mobile devices and the Internet of Things (IoT) more prevalent than ever, the requirements for non-volatile memory have increased day by day. The current mainstream nonvolatile memory is flash memory, which has the characteristics of low cost and large capacity and is widely used by the public. However, because flash memory requires a high write voltage and encountered many problems in the process of scaling down, it has fallen into a bottleneck. Therefore, the next generation of non-volatile memory (ReRAM, STT-MRAM, ... etc.) has been expanded. Compared with flash memory, the next-generation non-volatile memory can use a lower voltage to write, faster read speed, smaller area, and logic process compatibility. These advantages make non-volatile memory is more suitable for the embedded system than flash memory.
With the development of deep learning and the IoT, the amount of data that needs to be calculated increases with the complexity of the neural network. However, the traditional Von Neumann architecture wastes most of the time by transfer information between processor and memory. So, the bandwidth limitation between the two has caused a bottleneck in computing speed. Therefore, in recent years, Computing-In-Memory (CIM) has been proposed to solve this problem, which makes the information that needs to be transmitted is calculated, reduces the movement of information, and improves processing efficiency, combined with the characteristics of non-volatile memory. This makes non-volatile memory more suitable for mobile devices and the IoT.
This master's thesis will discuss the challenges faced by non-volatile memory operations and propose a voltage sense amplifier to solve these problems. The main challenges are as follows:
As the complexity of the network increases, to improve accuracy, multi-bit input and weights are necessary. However, as the number of output bits increases, the non-volatile memory internal arithmetic architecture takes longer to complete, and the operation speed decreases.
Under the limited voltage, the traditional voltage sensor cannot perform the normal operation in the part lower than the critical voltage. Therefore, the sensing margin between different accumulated values is reduced, and the reading yield will also be reduced.
Therefore, in this paper, a voltage sensing amplifier (VSA) is proposed, which can produce sequential two-bit outputs within a sensing period, which are the values of 00, 01, 10, and 11. The speed of the proposed VSA is 48% ~ 52% reduction than the traditional VSA, and the speed of CIM macro is 27% ~ 39% reduction than using the traditional VSA. Also, the proposed VSA supports full-range voltage sensing, so that the sensing margin can be enlarged when performing CIM operations. At the same time, it can eliminate process variation and amplify the sensing margin to achieve a higher yield with a small sensing margin. The proposed VSA can improve 1.76x ~ 2.91x input offset voltage under different common-mode voltage.
We implement a 4Mb ReRAM CIM macro, using TSMC 22nm process. At a normal operating voltage of 0.8V, the measured 2bit speed of the 2b-FVRSA is 1.36ns and the measured 1bit speed of traditional VSA is 1.24ns. The speed of 8bit input and 8bit weights of CIM macro can achieve 14.8ns of 8bit outputs.
摘要 i
Abstract iii
致謝 v
List of Figures viii
List of Tables x
Chapter 1 Introduction 1
1.1 The role of Memory in SoC products 1
1.2 Memory Landscape 2
1.2.1 RAM 4
1.2.2 CAM 5
1.2.3 ROM 5
1.2.4 Programmable NVMs 5
1.3 Challenges of Flash Memory 6
1.4 Emerging Non-Volatile Memories 8
1.5 Von Neumann bottleneck 10
1.6 Computing-In-Memory (CIM) 11
Chapter 2 Characteristic of ReRAM 12
2.1 Structure of ReRAM 12
2.2 Read Operation 13
2.3 Write Operation 14
Chapter 3 Design Challenges of nvCIM Read 16
3.1 Design Challenges 16
3.1.1 Threshold Voltage in Process 16
3.1.2 Small sensing margin of nvCIM Read 17
3.1.3 High precision of nvCIM 19
3.2 Conventional Sense Amplifier 19
3.3 Previous Arts 22
Chapter 4 Proposed Sensing Schemes and Analysis 28
4.1 Proposed Sense Amplifier 28
4.1.1 Motivation and Concept of Proposed Sense Amplifier 28
4.1.2 Structure of Proposed Sense Amplifier 31
4.1.3 Operation of Proposed Sense Amplifier 32
4.2 Analysis and Comparison 35
4.2.1 Capacitor analysis 35
4.2.2 Yield analysis 37
4.2.3 Access Time (TAC) Improvement 39
4.2.4 Different ways of sense amplifier to sense two bit 40
Chapter 5 Measurement Results and Conclusion 42
5.1 ReRAM Macro 42
5.2 Design for Test-chip 42
5.3 Measurement results 44
5.4 Conclusions and Future Work 47
Reference 50

[1] K. Itoh et al., "VLSI Memory Chip Design", Springer-Verlag, pp. 1-46, 2001.
[2] M. Bohr, "The new era of scaling in an SoC world," 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, San Francisco, CA, 2009, pp. 23-28, doi: 10.1109/ISSCC.2009.4977293.
[3] F. Menichelli and M. Olivieri, "Static Minimization of Total Energy Consumption in Memory Subsystem for Scratchpad-Based Systems-on-Chips," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, no. 2, pp. 161-171, Feb. 2009, doi: 10.1109/TVLSI.2008.2001940.
[4] D. Smith et al., "A 3.6 ns 1 Kb ECL I/O BiCMOS UV EPROM," IEEE International Symposium on Circuits and Systems, New Orleans, LA, USA, 1990, pp. 1987-1990 vol.3, doi: 10.1109/ISCAS.1990.112119.
[5] C. Kuo et al., "A 512-kb flash EEPROM embedded in a 32-b microcontroller," in IEEE Journal of Solid-State Circuits, vol. 27, no. 4, pp. 574-582, April 1992, doi: 10.1109/4.126546.
[6] S. H. Kulkarni, Z. Chen, J. He, L. Jiang, M. B. Pedersen and K. Zhang, "A 4 kb Metal-Fuse OTP-ROM Macro Featuring a 2 V Programmable 1.37 um2 1T1R Bit Cell in 32 nm High-k Metal-Gate CMOS," in IEEE Journal of Solid-State Circuits, vol. 45, no. 4, pp. 863-868, April 2010, doi: 10.1109/JSSC.2010.2040115.
[7] Kang-Deog Suh et al., "A 3.3 V 32 Mb NAND flash memory with incremental step pulse programming scheme," in IEEE Journal of Solid-State Circuits, vol. 30, no. 11, pp. 1149-1156, Nov. 1995, doi: 10.1109/4.475701.
[8] R. Bez, E. Camerlenghi, A. Modelli and A. Visconti, "Introduction to flash memory," in Proceedings of the IEEE, vol. 91, no. 4, pp. 489-502, April 2003, doi: 10.1109/JPROC.2003.811702.
[9] Y. Koh, "NAND Flash Scaling Beyond 20nm," 2009 IEEE International Memory Workshop, Monterey, CA, 2009, pp. 1-3, doi: 10.1109/IMW.2009.5090600.
[10] K. Prall, "Scaling Non-Volatile Memory Below 30nm," 2007 22nd IEEE Non-Volatile Semiconductor Memory Workshop, Monterey, CA, 2007, pp. 5-10, doi: 10.1109/NVSMW.2007.4290561.
[11] S. Lee, "Scaling Challenges in NAND Flash Device toward 10nm Technology," 2012 4th IEEE International Memory Workshop, Milan, 2012, pp. 1-4, doi: 10.1109/IMW.2012.6213636.
[12] Jiyoung Kim et al., "Novel Vertical-Stacked-Array-Transistor (VSAT) for ultra-high-density and cost-effective NAND Flash memory devices and SSD (Solid State Drive)," 2009 Symposium on VLSI Technology, Honolulu, HI, 2009, pp. 186-187.
[13] H. Noguchi et al., "7.5 A 3.3ns-access-time 71.2μW/MHz 1Mb embedded STT-MRAM using physically eliminated read-disturb scheme and normally-off memory architecture," 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, San Francisco, CA, 2015, pp. 1-3, doi: 10.1109/ISSCC.2015.7062963.
[14] G. De Sandre et al., "A 90nm 4Mb embedded phase-change memory with 1.2V 12ns read access time and 1MB/s write throughput," 2010 IEEE International Solid-State Circuits Conference - (ISSCC), San Francisco, CA, 2010, pp. 268-269, doi: 10.1109/ISSCC.2010.5433911.
[15] D. Takashima, Y. Nagadomi and T. Ozaki, "A 100MHz ladder FeRAM design with capacitance-coupled-bitline (CCB) cell," 2010 Symposium on VLSI Circuits, Honolulu, HI, 2010, pp. 227-228, doi: 10.1109/VLSIC.2010.5560289.
[16] K. Aratani et al., "A Novel Resistance Memory with High Scalability and Nanosecond Switching," 2007 IEEE International Electron Devices Meeting, Washington, DC, 2007, pp. 783-786, doi: 10.1109/IEDM.2007.4419064.
[17] S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu and Y. Xie, "Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories," 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, 2016, pp. 1-6, doi: 10.1145/2897937.2898064.
[18] F. Su et al., "A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory," 2017 Symposium on VLSI Technology, Kyoto, 2017, pp. T260-T261, doi: 10.23919/VLSIT.2017.7998149.
[19] J. Zhang, Z. Wang and N. Verma, "In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array," in IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915-924, April 2017, doi: 10.1109/JSSC.2016.2642198.
[20] W. Khwa et al., "A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors," 2018 IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 496-498, doi: 10.1109/ISSCC.2018.8310401.
[21] X. Si et al., "24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning," 2019 IEEE International Solid- State Circuits Conference - (ISSCC), San Francisco, CA, USA, 2019, pp. 396-398, doi: 10.1109/ISSCC.2019.8662392.
[22] W. Chen et al., "A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors," 2018 IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 494-496, doi: 10.1109/ISSCC.2018.8310400.
[23] C. Xue et al., "24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors," 2019 IEEE International Solid- State Circuits Conference - (ISSCC), San Francisco, CA, USA, 2019, pp. 388-390, doi: 10.1109/ISSCC.2019.8662395.
[24] F. Su et al., "A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory," 2017 Symposium on VLSI Technology, Kyoto, 2017, pp. T260-T261, doi: 10.23919/VLSIT.2017.7998149.
[25] F. Tan et al., "A ReRAM-Based Computing-in-Memory Convolutional-Macro With Customized 2T2R Bit-Cell for AIoT Chip IP Applications," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 9, pp. 1534-1538, Sept. 2020, doi: 10.1109/TCSII.2020.3013336.
[26] C. Xue et al., "15.4 A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices," 2020 IEEE International Solid- State Circuits Conference - (ISSCC), San Francisco, CA, USA, 2020, pp. 244-246, doi: 10.1109/ISSCC19947.2020.9063078.
[27] Yuan Heng Tseng, Chia-En Huang, C. -. Kuo, Y. -. Chih and Chrong Jung Lin, "High density and ultra small cell size of Contact ReRAM (CR-RAM) in 90nm CMOS logic technology and circuits," 2009 IEEE International Electron Devices Meeting (IEDM), Baltimore, MD, 2009, pp. 1-4, doi: 10.1109/IEDM.2009.5424408.
[28] H. Y. Lee et al., "Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM," 2008 IEEE International Electron Devices Meeting, San Francisco, CA, 2008, pp. 1-4, doi: 10.1109/IEDM.2008.4796677.
[29] A. Ranjan et al., "Analysis of quantum conductance, read disturb and switching statistics in HfO2 RRAM using conductive AFM," Microelectronics Reliability Volume 64, September 2016, Pages 172-178
[30] Byoungil Lee and H. -. P. Wong, "NiO resistance change memory with a novel structure for 3D integration and improved confinement of conduction path," 2009 Symposium on VLSI Technology, Honolulu, HI, 2009, pp. 28-29.
[31] U. Russo, D. Ielmini, C. Cagli and A. L. Lacaita, "Self-Accelerated Thermal Dissolution Model for Reset Programming in Unipolar Resistive-Switching Memory (RRAM) Devices," in IEEE Transactions on Electron Devices, vol. 56, no. 2, pp. 193-200, Feb. 2009, doi: 10.1109/TED.2008.2010584.
[32] L. Zhang et al., "Unipolar TaOx-Based Resistive Change Memory Realized With Electrode Engineering," in IEEE Electron Device Letters, vol. 31, no. 9, pp. 966-968, Sept. 2010, doi: 10.1109/LED.2010.2052091.
[33] Ching-Hua Wang et al., "Three-dimensional 4F2 ReRAM cell with CMOS logic compatible process," 2010 International Electron Devices Meeting, San Francisco, CA, 2010, pp. 29.6.1-29.6.4, doi: 10.1109/IEDM.2010.5703446.
[34] K. Aratani et al., "A Novel Resistance Memory with High Scalability and Nanosecond Switching," 2007 IEEE International Electron Devices Meeting, Washington, DC, 2007, pp. 783-786, doi: 10.1109/IEDM.2007.4419064.
[35] J. Lee et al., "Diode-less nano-scale ZrOx/HfOx RRAM device with excellent switching uniformity and reliability for high-density cross-point memory applications," 2010 International Electron Devices Meeting, San Francisco, CA, 2010, pp. 19.5.1-19.5.4, doi: 10.1109/IEDM.2010.5703393.
[36] J. Colinge, et al., “Physics of Semiconductior Devices,” Springer-Verlag, NY, pp. 175-182, 2002.
[37] B. Wicht, T. Nirschl and D. Schmitt-Landsiedel, "A yield-optimized latch-type SRAM sense amplifier," ESSCIRC 2004 - 29th European Solid-State Circuits Conference (IEEE Cat. No.03EX705), Estoril, Portugal, 2003, pp. 409-412, doi: 10.1109/ESSCIRC.2003.1257159.
[38] J. Javanifard et al., "A 45nm Self-Aligned-Contact Process 1Gb NOR Flash with 5MB/s Program Speed," 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, San Francisco, CA, 2008, pp. 424-624, doi: 10.1109/ISSCC.2008.4523238.
[39] C. Lin et al., "7.4 A 256b-wordlength ReRAM-based TCAM with 1ns search-time and 14× improvement in wordlength-energyefficiency-density product using 2.5T1R cell," 2016 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 136-137, doi: 10.1109/ISSCC.2016.7417944.
[40] W. Khwa et al., "A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors," 2018 IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 496-498, doi: 10.1109/ISSCC.2018.8310401.
[41] J. Su et al., "15.2 A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips," 2020 IEEE International Solid- State Circuits Conference - (ISSCC), San Francisco, CA, USA, 2020, pp. 240-242, doi: 10.1109/ISSCC19947.2020.9062949.
(此全文20251013後開放外部瀏覽)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *