帳號:guest(18.222.9.171)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):盧姵容
作者(外文):Lu, Pei-Jung
論文名稱(中文):應用於多位元卷積神經網路以基於高輸入精度計算單元6T靜態隨機存取記憶體之記憶體內運算架構
論文名稱(外文):A High Input Precision Computing Cell Based 6T SRAM Computing-in-Memory Scheme for Multi-bit Convolutional Neural Network
指導教授(中文):張孟凡
指導教授(外文):Chang, Meng-Fan
口試委員(中文):邱瀝毅
呂仁碩
口試委員(外文):Chiou, Lih-Yih
Liu, Ren-Shuo
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電子工程研究所
學號:107063542
出版年(民國):109
畢業學年度:109
語文別:英文
論文頁數:48
中文關鍵詞:記憶體內運算靜態隨機存取記憶體記憶體卷積神經網路
外文關鍵詞:Computing-in-MemorySRAMMemoryConvolutional Neural Network
相關次數:
  • 推薦推薦:0
  • 點閱點閱:738
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
隨著人工智慧和卷積神經網路的快速發展,對相關的硬體設備需求隨之提高,在傳統范紐曼(Von Neumann)架構下,大量的數據在記憶體和運算單元中傳輸,會消耗大量的功率,稱之為范紐曼瓶頸(Von Neumann bottle neck)。
為了解決范紐曼瓶頸,記憶體內運算電路是具有潛力的一個選項,記憶體內運算電路的設計目標即是讓所有運算在記憶體內完成,藉由減少數據傳輸所耗損的功率。
記憶體內運算電路是記憶體同時也是運算單元,數據儲存在記憶體內同時進行運算,減少大量的數據運輸,也具有較高的平行度(parallelism)。為了達到上述目的,本研究將神經網路特徵存入記憶體並激活,使記憶體內運算電路進行乘法和累加(Multiply and Accumulation, MAC)的功能。
本研究提出一使用6T SRAM進行多位元MAC功能的SRAM-CIM巨集,巨集電路使用(1)高輸入精度運算單元(6T High Input Precision Computing Cell),具平行處理8位元輸入和1位元權重運算功能,並提供較緊湊的面積,(2) 全域位元線 (GBL) 結合電路,減少巨集電路所需的感測電路,以達到巨集電路的較佳能源效率。製造的28nm 384Kb靜態隨機存取記憶體記憶體內運算巨集實現最高8位元輸入、8位元權重的MAC操作,擁有20位元的輸出精度,並達到運算時間3.8ns和能源效率14.97TOPS/W。
As the rapid development of Artificial Intelligence and convolutional neural network, the demand for related hardware equipment has increased. However, in the Von Neumann structure, the huge amount of data movement between memory and computing unit consumes large power consumption. It is called Von Neumann bottleneck.
In order to solve Von Neumann bottleneck, Computing-in-Memory (CIM) is a potential option, and the goal of Computing-in-Memory is that memory can both compute and store data. Hence, in Computing-in-Memory architecture, there is no computing unit and it has higher parallelism. The data movement between memory and computing unit can be significantly reduced, and the power consumption can be significantly improved.
In order to achieve the above objectives, the idea of this work is to feed the neural network feature map into the memory array and activate it in parallel, so the memory can perform the functions of multiplication and accumulation (MAC).
This work proposes a SRAM-CIM macro based on 6T compact SRAM cell to perform multi-bit MAC operation. This macro uses (1) High Input Precision Computing Cell (HIPCC), it can perform 8-bit input and 1-bit weight computation with compact array area, (2) GBL combine method, it can reduce the number of used sensing circuit to achieve better energy efficiency.
The fabricated 28nm 384Kb SRAM-CIM macro realizes the function of up to 8-bit input, 7-bit weight, 16 channel accumulation, and 20-bit output precision MAC operation, and achieves 3.8ns and 14.97TOPS/W.
摘要 i
Abstract ii
致謝 iv
Contents v
List of Figures vii
List of Tables ix
Chapter 1 Introduction 1
1.1 Memory Landscape 1
1.1.1 RAM 3
1.1.2 CAM 4
1.1.3 ROM 4
1.1.4 Programmable NVMs 5
1.2 Von Neumann bottleneck 6
1.3 Computing-in-Memory (CIM) 7
Chapter 2 Introduction for SRAM 9
2.1 Introduction for Conventional 6T SRAM 9
2.1.1 Structure of SRAM 9
2.1.2 Write Operation and Write Margin 10
2.1.3 Read Operation 11
2.1 Introduction for Hierarchical Bitline 6T SRAM 12
2.1.1 Structure of Hierarchical Bitline 6T SRAM Array 12
2.1.2 Write Operation 14
2.1.3 Read Operation 14
Chapter 3 Previous Work 16
3.1 10T SRAM-CIM with Binary Weight Analog Computing 16
3.2 Twin-8T SRAM-CIM with Multi-bit Weight Word-wise Analog computing 18
3.3 6T SRAM-CIM with Local Computing Cell and Multi-bit Weight Bit-wise Analog Computing 21
3.4 8T SRAM-CIM with Multi-bit Input/Weight 23
Chapter 4 Proposed Schemes and Analysis 26
4.1 Proposed Computing-In-Memory Circuit Scheme 27
4.1.1 Proposed CIM Structure 27
4.1.2 Proposed High Input Precision Computing Cell (HIPCC) structure and Computing-in-Memory Operation 28
4.1.3 Proposed GBL Combine Method and Scheme 31
4.2 Analysis and Comparison 36
Chapter 5 Macro Implementation 39
5.1 Floor Plan of SRAM-CIM Macro 39
5.2 Design for Test Chip 40
Chapter 6 Experimental Results and Conclusion 42
6.1 Measured Performance 42
6.2 Comparison to Previous Work 43
6.3 Conclusions and Future Work 45
Reference 47

[1] H. Qin, et al., “SRAM leakage suppression by minimizing standby supply voltage,” in IEEE International Symposium on Quality Electronic Design, pp. 55-60, 2004.
[2] K. Nii, et al., “A Low Power SRAM using Auto-Backgate-Controlled MT-CMOS,”in IEEE International Symposium on Low Power Electronics and Design, pp. 293-298, Aug. 1998.
[3] C. Morishima, et al., “A 1-V 20-ns 512-Kbit MT-CMOS SRAM with Auto-Power-Cut Scheme Using Dummy Memory Cells,”in IEEE European Solid-State Circuit Conference, pp. 452-455, Sept. 1998.
[4] A. G. Hanlon et al., “Content-Addressable and Associative Memory Systems a Survey,” IEEE Transactions on Electronic Computers, vol. EC-15, no.4, pp.509-521, Aug. 1966.
[5] C. C. Wang et al., “An Adaptively Dividable Dual-Port BiTCAM for Virus-Detection Processors in Mobile Devices,” IEEE International Solid-State Circuits Conference, pp.390-622, Feb. 2008.
[6] J. Li et al., “1 Mb 0.41 µm² 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing,” IEEE Journal of Solid-State Circuits, vol. 49, Issue 4, pp. 896-907, Apr. 2014.
[7] M. F. Chang et al., “A 3T1R Nonvolatile TCAM Using MLC ReRAM with Sub-1ns Search Time,” IEEE International Solid-State Circuits Conference, pp. 1-3, Feb. 2015.
[8] D. Smith et al., “A 3.6ns 1Kb ECL I/O BiCMOS U.V. EPROM,” IEEE International Symposium on Circuits and Systems, vol. 3, pp. 1987-1990, May. 1990.
[9] C. Kuo et al., “A 512-kb flash EEPROM embedded in a 32-b microcontroller,” IEEE Journal of Solid-State Circuits, vol. 27, Issue 4, pp. 574-582, Apr. 1992.
[10] S. H. Kulkarni et al., “A 4 kb Metal-Fuse OTP-ROM Macro Featuring a 2 V Programmable 1.37 μm2 1T1R Bit Cell in 32 nm High-k Metal-Gate CMOS,” IEEE Journal of Solid-State Circuits, vol. 45, Issue 4, pp. 863-868, Apr. 2010.
[11] Y. H. Tsai et al., “45nm Gateless Anti-Fuse Cell with CMOS Fully Compatible Process,” IEEE International Electron Devices Meeting, pp. 95-98, Dec. 2007.
[12] Webfeet Inc., “Semiconductor industry outlook,” Non-Volatile Memory Conference, 2002.
[13] S. L. Min et al., “Current trends in flash memory technology,” IEEE Asia and South Pacific Conference on Design Automation, pp. 24-27, Jan. 2006.
[14] F. Masuoka et al., “New ultra high density EPROM and flash EEPROM with NAND structure cell,” IEEE International Electron Devices Meeting, vol. 33, pp. 552-555, 1987.
[15] A. Bergemont et al., “NOR virtual ground (NVG)-a new scaling concept for very high density flash EEPROM and its implementation in a 0.5 um process,” IEEE International Electron Devices Meeting, pp. 15-18, Dec. 1993.
[16] D. Kuzum et al., “Nanoelectronic programmable synapses based on phase change materials for brain-inspired computing,” Nano Letters 12 (5), 2179-2186, 2012.
[17] B. Chen et al., “Efficient in-memory computing architecture based on crossbar arrays,” IEEE International Electron Devices Meeting, pp. 17.5.1-17.5.4, 2015.
[18] S. Li et al., “Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories,” ACM/EDAC/IEEE Design Automation Conference, pp. 1-6, 2016.
[19] Q. Dong et al., “A 0.3V VDDmin 4+2T SRAM for searching and in-memory computing using 55nm DDC technology,” IEEE Symposium on VLSI Circuits, pp. C160-C161, 2017.
[20] J. Zhang, Z. Wang and N. Verma, "In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array," in IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915-924, April 2017.
[21] A. Biswas, et al., “Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications” IEEE International Solid-State Circuits Conference, pp. 488-489, 2018
[22] X. Si et al., “A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning” IEEE International Solid-State Circuits Conference, pp. 396-397, 2019
[23] X. Si et al., “A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips” IEEE International Solid-State Circuits Conference, pp. 246-247, 2020
[24] Q. Dong et al., “A 351TOPS/W and 372GOPS Computing-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Application” IEEE International Solid-State Circuits Conference, pp. 242-243, 2020
[25] J. Yue et al., “A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural-Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1× Higher TOPS/mm2 and 6T HBST-TRAM-Based 2D Data-Reuse Architecture” IEEE International Solid-State Circuits Conference, pp. 138-139,2019
(此全文20251013後開放外部瀏覽)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top

相關論文

1. 應用於智慧型資料處理以靜態隨機存取記憶體為基礎之記憶體內運算單元
2. 應用於深度學習資料處理以靜態隨機存取記憶體為基礎之記憶體內運算巨集
3. 應用於深度神經網絡資料處理以基於區域計算單元6T靜態隨機存取記憶體之記憶體內運算巨集
4. 應用於深度神經網絡資料處理以基於電荷分享之分段式位元線6T靜態隨機存取記憶體之記憶體內運算結構
5. 應用於以靜態隨機存取記憶體為基礎之記憶體內運算巨集之高面積效率低功耗多位元讀取電路
6. 應用於深度神經網絡資料處理以基於時域脈衝邊緣6T靜態隨機存取記憶體之記憶體內運算結構
7. 應用於6T靜態隨機存取記憶體之記憶體內運算巨集之基於脈衝邊緣高效指數運算架構
8. 應用於圖神經網路邊緣裝置之高效能靜態隨機存取記憶體內運算智慧加速器
9. 應用於人工智慧邊緣裝置之高能源效率記憶體內運算技術
10. 高能效記憶體內運算人工智慧處理器之系統與電路共同設計方案
11. 應用於智慧邊緣運算處理器之高效能多位元非揮發性記憶體內運算巨集
12. 應用於邊緣人工智慧裝置之高精度與高能源效率記憶體內運算處理器設計
13. 針對記憶體內運算輸入優化之多類型全加器混合加法樹架構
14. 應用於神經網路模型之雙模靜態權重寫入架構及相應之記憶體內運算流程
15. 一個基於記憶體內運算硬體設計的卷積神經網路部分和量化演算法
 
* *