帳號:guest(3.147.57.197)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):劉智勇
作者(外文):Liu, Zhi Yong
論文名稱(中文):低功耗低延遲的動態隨機存取記憶體控制器設計
論文名稱(外文):Controller Design for a Low Power, Low Latency DRAM with Built-in Cache
指導教授(中文):吳誠文
指導教授(外文):Wu, Cheng Wen
口試委員(中文):黃稚存
李進福
陳竹一
口試委員(外文):Huang, Chih Tsun
Li, Jin Fu
Chen, Jwe E
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:101061466
出版年(民國):104
畢業學年度:104
語文別:英文
論文頁數:42
中文關鍵詞:DRAM 控制器3D DRAM內建快取記憶體記憶體層級分段式延遲DRAM
外文關鍵詞:DRAM controller3D DRAMbuilt-in cache DRAM (BC-DRAM)memory hierarchytiered latency DRAM (TL-DRAM)
相關次數:
  • 推薦推薦:0
  • 點閱點閱:608
  • 評分評分:*****
  • 下載下載:20
  • 收藏收藏:0
由於“記憶體牆”的存在,在今天的計算機系統中,記憶體的效能表現仍然是重要的瓶頸之一。在降低記憶體的功耗和延遲的同時,DRAM製造商也希望在生產DRAM的時候保持著產品低廉成本的特點。因此近些年來新型架構的DRAM被提出來以解決上述這些問題,這些使用了非對稱型位線結構的DRAM叫做分層延遲式DRAM。我們實驗使用的記憶體和這類DRAM結構很相似,但是我們把記憶體正列裡面的小陣列那一塊當作快取記憶體來使用,並且取了一個新名字叫做內建快取記憶體式DRAM.而且為了最大利用這種記憶體架構我們提出了合適的演算法。
在提出的DRAM控制器的設計中,由於資料會經常通過比較小的陣列被存取,如果小陣列中並沒有需求的資料,那麼資料就會從比較大的陣列中被搬運到小陣列中來。當小陣列中的資料已經存滿或者DRAM中新地址需要訪問時,我們需要決定好適當地演算法來在小陣列內清除一個存放資料的存儲單位。這些清除小陣列資料的演算法包括“先進先出”,“最少使用先出”,和“最早使用先出”三種。
後來我使用了已經修改過後的DRAMsSim2這種精確到週期數的記憶體系統模擬器,來結合控制器和DRAM模型進行驗證。由於模擬時基礎的規格使用的3D WIDE IO DRAM的標準,我們從整個記憶體系統的實驗結果來看內建式快取記憶體式DRAM的延遲和功耗確實比一般的WIDE IO DRAM更低。與此同時我們也做了一些大小陣列不同比例、地址訪問規則不同、和快取記憶體不同路數目下的關聯情況研究,來觀察實驗並且進行相互比較。
Memory system’s performance is still a significant bottleneck in today’s computer system due to the memory wall issue. In order to reduce the power and latency, DRAM vendors also hope that they can keep the characteristic of low production cost at the same time. Therefore, a new type of DRAM with asymmetric bitlines, called Tiered Latency DRAM (TL-DRAM), was put forward recently. Our target DRAM is similar to TL-DRAM, but we operate the small array like a cache, so we rename the DRAM as the Built-in Cache DRAM (BC-DRAM). In this work we propose a controller design with appropriate algorithms to get the most out of the BC-DRAM.
In the proposed DRAM controller design, the data should be accessed from the small array as often as possible. If the small array does not contain the requested data, the requested data should be migrated from the large array to the small array. When the cache for small array is full and a new row address is requested to store in the cache, we need to determine the victim to be replaced. In this thesis, three replacement policies are integrated in the controller design to determine the victim, i.e., first-in-first-out (FIFO), least-used-first-out (LUFO), and earliest-used-first-out (EULO).
We have modified the DRAMSim2, a cycle accurate memory system simulator, to test the algorithms of our controller together with a DRAM model. Based on the Wide-IO 3D DRAM specifications, our experimental results show that BC-DRAM with the proposed controller will consume lower power and achieve lower latency than the typical DRAM. Experiments are also done to show the effects of different specifications, such as sizes of small and large arrays, address scrambling rules, and number of ways of set association.
Chapter 1 Introduction 6
1.1 Motivation 6
1.2 Thesis Organization 8
Chapter 2 DRAM Architecture Basics 9
2.1 Organization 9
2.2 Conventional DRAM Operations 11
2.3 DRAM Timing Constraints 11
Chapter 3 Data Access of the Proposed Memory Controller 15
3.1 DRAM Controller 15
3.2 The 1st Case: in Small Array 16
3.3 The 2nd Case: Not in Small Array, Small Array Not Full 17
3.4 The 3rd Case: Not in Small Array, Small Array Full, Victim Clean 18
3.5 The 4th Case: Not in Small Array, Small Array Full, Victim Dirty 19
Chapter 4 Proposed BC-DRAM Controller Design…………………………………………..20
4.1 DRAM Controller Design 20
4.2 BC-DRAM Controller Architecture 20
4.3 The Proposed Migration Arbiter for DRAM with Built-in Cache 22
4.4 The Address Cache for Small Array 25
4.5 Victim Determination Policy 26
Chapter 5 Experimental Results 30
Chapter 6 Conclusions and Future Work 37
6.1 Conclusions 37
6.2 Future Work 37
Reference 39
[1] A. R. Biswas and R. Giaffreda, “IoT and Cloud Convergence: Opportunities and Challenges,” in Proc. IEEE World Forum on Internet of Things, pp. 375–376, March. 2014.
[2] A. N. Udipi, N. Muralimanohar, N. Chatterjee, R. Balasubramonian, A. Davis, and N. P. Jouppi, “Rethinking DRAM Design and Organization for Energy-Constrained Multi-Cores,” in Proc. ACM International Symposium on Computer Architecture, pp. 175–186, June. 2010.
[3] W. A. Wulf and S. A. McKee, “Hitting the Memory Wall: Implications of the Obvious,” ACM SIGARCH Computer Architecture News, vol. 23, pp. 20–24, March. 1995.
[4] M. V. Wilkes, “The Memory Gap and the Future of High Performance Memories,” ACM SIGARCH Computer Architecture News, vol. 29, pp. 2–7, March. 2001.
[5] S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, “Memory Access Scheduling,” in Proc. ACM International Symposium on Computer Architecture, pp. 128–138, May. 2000.
[6] T. Vogelsang, “Understanding the Energy Consumption of Dynamic Random Access Memories,” in Proc. Microarchitecture, pp. 363–374, Dec. 2010.
[7] T. Kimura, K. Takeda, Y. Aimoto, N. Nakamura, T. Iwasaki, Y. Nakazawa, H. Toyoshima, M. Hamada, M. Togo, H. Nobusawa, and T. Tanigawa, “64Mb 6.8ns Random Row Access DRAM Macro for ASICs,” in Proc. International Solid-State Circuits Conference, pp. 416–417, Feb. 1999.
[8] Micron. RLDRAM 2 and 3 Specifications. http://www.micron.com/products/dram.
[9] Y. Sato, T. Suzuki, T. Aikawa, S, Fujioka, W. Fujieda, H. Kobayashi, H. Ikeda, T. Nagasawa, A. Funyu, Y. Fuji, K. Kawasaki, M. Yamazaki, and M. Taguchi, “Fast Cycle RAM (FCRAM); a 20-ns Random Row Access, Pipe-Lined Operating DRAM,” in Proc. VLSI Circuits, pp. 22–25, June. 1998.
[10] C. Toal, D. Burns, K. McLaughlin, S. Sezer, and S. O'Kane, “An RLDRAM II Implementation of a 10Gbps Shared Packet Buffer for Network Processing,” in Proc. Adaptive Hardware and Systems, pp. 613–618, Aug. 2007.
[11] ITRS. International Technology Roadmap for Semiconductors: Process Integration, Devices, and Structures. http://www.itrs.net/Links/2007ITRS/Home2007.htm, 2007.
[12] M. Jun, M.-J. Kim, and E.-Y. Chung, “Asymmetric DRAM Synthesis for Heterogeneous Chip Multiprocessors in 3D-Stacked Architecture,” in Proc. IEEE/ACM International Conference Computer-Aided Design, pp. 73–80, Nov. 2012.
[13] D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu, “Tiered-latency DRAM: A Low Latency and Low Cost DRAM Architecture,” in Proc. High Performance Computer Architecture, pp. 615–626, Feb. 2013.
[14] Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu, “A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM,” in Proc. ACM International Symposium on Computer Architecture, pp. 368–379, June. 2012.
[15] S. M. Sharroush, Y. S. Abdalla, A. A. Dessouki, and E.-S. A. El-Badawy, “Dynamic random-access memories without sense amplifiers,” Elektrotechnik & Informationstechnik, pp. 88–101, Nov. 2012.
[16] Rambus. DRAM Power Model. http://www.rambus.com/energy, 2010.
[17] JEDEC Solid State Technology Association [Online]. Available: http://www.jedec.org/.
[18] S. Rixner, "Memory Controller Optimizations for Web Servers," in Proc. Microarchitecture, pp.355–366, Dec. 2004.
[19] E. Ipek, O. Mutlu, J. F. Martinez, and R. Caruana, "Self-Optimizing Memory Controllers: A Reinforcement Learning Approach," in Proc. International Symposium on Computer Architecture, pp. 39–50, June. 2008.
[20] O. Mutlu, and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems," in Proc. International Symposium on Computer Architecture, pp. 63–74, June. 2008.
[21] H.-J. Lee, and E.-Y. Chung, "Scalable QoS-Aware Memory Controller for High-Bandwidth Packet Memory," in Very Large Scale Integration Systems, IEEE Transactions on, vol.16, no.3, pp. 289–301, March. 2008.
[22] X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi, "Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support," in Proc. High Performance Computing, Networking, Storage and Analysis, pp. 1–11, Nov. 2010.
[23] Y. Kim, D. Han, O. Mutlu, and M. H.-Balter, "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers," in Proc. High Performance Computer Architecture, pp. 1–12, Jan. 2010.
[24] C. J. Lee, O. Mutlu, V. Narasiman, and Y. N. Patt, "Prefetch-Aware Memory Controllers," in Computers, IEEE Transactions on, vol.60, no.10, pp. 1406–1430, Oct. 2011.
[25] K. Li, Q. Guang, L. Lei, Y.-J. Peng, and J.-Y. Shi, "A High-Performance DRAM Controller Based on Multi-Core System Through Instruction Prefetching," in Proc. International Conference on Electronics, Communications and Control, pp. 1220–1223, Sep. 2011.
[26] Y.-J. Liu, C.-C. Yang, S.-L. Chen, C.-C. Chiu, C.-C. Chu, C.-M. Wu, and C.-M. Huang, "An Efficient Memory Controller for 3D Heterogeneous Integration Platform," in Proc. VLSI Design, Automation, and Test, pp. 1–4, April. 2012.
[27] M. N. Bojnordi, and E. Ipek, "PARDIS: A Programmable Memory Controller for the DDRx Interfacing Standards," in Proc. International Symposium on Computer Architecture, pp. 13–24, June. 2012.
[28] M. D. Gomony, B. Akesson, and K. Goossens, "Architecture and Optimal Configuration of a Real-Time Multi-Channel Memory Controller," in Proc. Design, Automation & Test in Europe Conference & Exhibition, pp. 1307–1312, March. 2013.
[29] J. Reineke, I. Liu, H. D. Patel, S. Kim, and E. A. Lee, "PRET DRAM Controller: Bank Privatization for Predictability and Temporal Isolation," in Proc. Hardware/Software Codesign and System Synthesis, pp. 99–108, Oct. 2011.
[30] W. Shin, J. Yang, J. Choi, and L.-S. Kim, "NUAT: A Non-Uniform Access Time Memory Controller," in Proc. High Performance Computer Architecture, pp. 464–475, Feb. 2014.
[31] Y. Wang, A. Ferraiuolo, and G. E. Suh, "Timing Channel Protection for a Shared Memory Controller," in Proc. High Performance Computer Architecture, pp. 225–236, Feb. 2014.
[32] A. Hansson, N. Agarwal, A. Kolli, T. Wenisch, and A. N. Udipi, "Simulating DRAM Controllers for Future System Architecture Exploration," in Proc. Performance Analysis of Systems and Software, pp. 201–210, March. 2014.
[33] S. Khan, A. R. Alameldeen, C. Wilkerson, O. Mutluy, and D. A. Jimenezz, “Improving Cache Performance Using Read-Write Partitioning,” in Proc. High Performance Computer Architecture, pp. 452–463, Feb. 2014.
[34] S. M. Khan, Z. Wang, and D. A. Jimenez, “Decoupled Dynamic Cache Segmentation,” in Proc. High Performance Computer Architecture, pp. 1–12, Feb. 2012.
[35] S. Seo, J. Lee, and Z. Sura, “Design and Implementation of Software-managed Caches for Multicores with Local Memory,” in Proc. High Performance Computer Architecture, pp. 55–66, Feb. 2009.
[36] M. Chaudhuri, “PageNUCA: Selected Policies for Page-Grain Locality Management in Large Shared Chip-Multiprocessor Caches,” in Proc. High Performance Computer Architecture, pp. 227–238, Feb. 2009.
[37] J. D. Collins, and D. M. Tullsen, “Hardware Identification of Cache Conflict Misses,” in Proc. Microarchitecture, pp. 126–135, Nov. 1999.
[38] A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer, “High Performance Cache Replacement Using Re-reference Interval Prediction,” in Proc. International Symposium on Computer Architecture, pp. 60–71, June. 2010.
[39] T. L. Johnson, D. A. Connors, M. C. Merten, and W.-M. W. Hwu, “Run-Time Cache Bypassing,” in Computers, IEEE Transactions on, vol.48, no. 12, pp. 1338–1354, Dec. 1999.
[40] T. Piquet, O. Rochecouste, and A. Seznec, “Exploiting Single-Usage for Effective Memory Management,” in Proc. Asia-Pacific conference on Advances in Computer Systems Architecture, pp. 90–101, Dec. 2007.
[41] M. K. Qureshi, and Y. N. Patt, "Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches," in Proc. Microarchitecture, pp. 423–432, Dec. 2006.
[42] V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry, “The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing,” in Proc. International Conference on Parallel Architectures and Compilation Techniques, pp. 355–366, Sep. 2012.
[43] H.-C. Shih, P.-W. Luo, J.-C. Yeh, S.-Y. Lin, D.-M. Kwai, S.-L. Lu, S., A., and C.-W. Wu, "DArT: A Component-Based DRAM Area, Power, and Timing Modeling Tool," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 33, no. 9, pp. 1356–1369, Sep. 2014.
[44] DRAMPower: Open Source DRAM Power & Energy Estimation Tool. Available: http://www.es.ele.tue.nl/drampower/
[45] S. Burkhart, R. Chase, J. Arada and K. Morris, "PMTA Specification," 2010.
[46] P. Rosenfeld, E. C.-Balis, and B. Jacob, "DRAMSim2: A Cycle Accurate Memory System Simulator," in Computer Architecture Letters, vol. 10, no. 1, pp. 16–19, Jan. –June 2011.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *