帳號:guest(3.136.18.93)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):林佩藍
作者(外文):LIN, PEI-LAN
論文名稱(中文):在高效能多核心系統平台的免重整之壓縮式快取記憶體架構
論文名稱(外文):Compaction-free Compressed Cache for High Performance Multi-core System
指導教授(中文):黃婷婷
指導教授(外文):Hwang, Ting Ting
口試委員(中文):金仲達
黃俊達
口試委員(外文):King, Chung-Ta
Huang, Juinn-Dar
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:101062598
出版年(民國):104
畢業學年度:103
語文別:英文中文
論文頁數:29
中文關鍵詞:壓縮式快取記憶體高效能多核心系統末級快取記憶體快取記憶體重整
外文關鍵詞:Compressed CacheHigh performance Multi-core SystemLast Level CacheCache Compaction
相關次數:
  • 推薦推薦:0
  • 點閱點閱:91
  • 評分評分:*****
  • 下載下載:3
  • 收藏收藏:0
壓縮式快取記憶體通常被利用在末級快取記憶體上來增加有效的儲存空間 [1]。然而,因為各種資料壓縮的大小不同,導致在這種快取記憶體架構無法避免的儲存碎片問題。當儲存碎片問題發生時,通常會執行重整處理來挪出連續的儲存空間。這種重整處理會造成額外的時間週期負擔還有降低壓縮式快取記憶體架構的效率。在此,我們提出了一個免重整的壓縮式快取記憶體架構,可以完全減少執行重整處理所需要的時間。基於此架構,我們證明了我們的結果比傳統的快取記憶體架構增加了16%的系統性能並且減少了16%的能量消耗。我們的架構也比 Alameldeen 等人 [1]所提出的架構增加了5%的系統性能並且減少了3%的能量消耗。與 Sardashti等人 [2]所提出的架構,我們也多增進了3%的系統性能還有降低了2%的能量消耗。
Compressed cache was used in shared last level cache (LLC) to increase the e ffective capacity [1]. However, because of various data compression sizes, fragmentation problem of storage is inevitable in this cache design. When it happens, usually, a compaction process
is invoked to make contiguous storage space. This compaction process induces extra cycle penalty and degrades the e ffectiveness of compressed cache design. In this paper, we propose a compaction-free compressed cache architecture which can completely eliminate the time for executing compaction. Based on this cache design, we demonstrate that our results, compared with the conventional cache, have system performance improvement by 16% and energy reduction by 16% . Compared with the work by Alameldeen et al. [1], our design
has 5% more performance improvement and 3% more energy reduction. Compared with the work by Sardashti et al. [2], our design has 3% more performance improvement and 2% more
energy reduction.
i
1 Introduction 1
2 Previous Work 4
2.1 Data Compression Algorithms for Compressed Cache . . . . . . . . . . . . . 4
2.2 Compressed Cache Management . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Compressed LLC Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Background and Motivation 6
3.1 Review of Decoupled Variable-Segment Cache Architecture . . . . . . . . . . 6
3.2 Compaction Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Methodology 12
4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 Experimental Results 19
5.1 Experimental Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2 Performance and Energy Results . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 Area Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.4 Analysis of Performance and Area Overhead with Di erent Compression Seg-
ment Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6 Conclusions 25
[1] A. R. Alameldeen and D. A.Wood, “Adaptive cache compression for high-performance
processors,” Proc. the 31st Annual International Symposium on Computer Architecture
(ISCA), pp. 212–223, 2004.
[2] S. Sardashti and D. A. Wood, “Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching,” Proc. the 46th Annual IEEE/ACM
International Symposium on Microarchitecture (MICRO), pp. 62–73, 2013.
[3] A. R. Alameldeen and D. A.Wood, “Frequent pattern compression: A significance-based
compression scheme for l2 caches,” Technical Report 1500, University of WisconsinMadison, Computer Sciences Department, Tech. Rep., 2004.
[4] E. Ahn, S.-M. Yoo, and S.-M. S. Kang, “Effective algorithms for cache-level compression,” Proc. the 11th Great Lakes symposium on VLSI (GLSVLSI), pp. 89–92, 2001.
[5] F. Douglis, “The compression cache: Using on-line compression to extend physical
memory,” Proc. 1993 Winter USENIX Conference, pp. 519–529, 1993.
[6] M. J. Freedman, “The compression cache: Virtual memory compression for handheld
computers,” Parallel and Distributed Operating Systems Group, MIT Lab for Computer
Science, Cambridge, Tech. Rep., 2000.
[7] X. Chen, L. Yang, R. Dick, L. Shang, and H. Lekatsas., “Cpack: A high-performance
microprocessor cache compression algorithm,” IEEE Trans. Very Large Scale Integration
(VLSI) Systems, vol. 18, no. 8, pp. 1196 –1208, 2010.
[8] L. Villa, M. Zhang, and K. Asanovic, “Dynamic zero compression for cache energy
reduction,” Proc. the 33rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 214–220, 2000.
[9] J. Yang, Y. Zhang, and R. Gupta, “Frequent value compression in data caches,” Proc.
the 33rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO),
pp. 258–265, 2000.
[10] G. Pekhimenko, V. Seshadri, O. Mutlu, M. A. Kozuch, P. B. Gibbons, and T. C.
Mowry, “Base-delta-immediate compression: Practical data compression for on-chip
caches,” Proc. the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 377–388, 2013.
[11] J. Dusser, T. Piquet, and A. Seznec, “Zero-content augmented caches,” Proc. the 23rd
international conference on Supercomputing (ICS), pp. 46–55, 2009.
[12] E. Hallnor and S. Reinhardt, “A compressed memory hierarchy using an indirect index
cache,” Proc. the 3rd Workshop on Memory performance issues: in con-junction with
the 31st international symposium on computer architecture (WMPI), pp. 9–15, 2004.
[13] ——, “A unified compressed memory hierarchy,” Proc. High-Performance Computer
Architecture (HPCA), pp. 201–212, 2005.
[14] S. Kim, J. Lee, J. Kim, and S. Hong, “Residue cache: A lowenergy low-area l2 cache
architecture via compression and partial hits,” Proc. the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 420–429, 2011.
[15] Y. Xie and G. Loh, “Thread-aware dynamic shared cache compression in multi-core
processors,” Proc. Computer Design (ICCD), pp. 135–141, 2011.
[16] S. Baek, H. G. Lee, C. Nicopoulos, J. Lee, and J. Kim, “ECM:Effective capacity maximizer for high-performance compressed caching,” Proc. High-Performance Computer
Architecture (HPCA), pp. 131–142, 2013.
[17] J.-S. Lee, W.-K. Hong, and S.-D. Kim, “An on-chip cache compression technique to
reduce decompression overhead and design complexity,” Journal of Systems Architecture
(JSA), vol. 46, no. 15, pp. 1365–1382, 2000.
[18] D. Chen, E. Peserico, and L. Rudolph, “A dynamically partitionable compressed cache,”
Proc. the Singapore-MIT Alliance Symposium, 2003.
[19] L. Benini, D. Bruni, B. Ricco, A. Macii, and E. Macii, “An adaptive data compression
scheme for memory traffic minimization in processor-based systems,” IEEE International Symposium on Circuits and Systems (ISCAS), pp. 866–869, 2002.
[20] L. Benini, D. Bruni, A. Macii, and E. Macii, “Hardware-assisted data compression for
energy minimization in systems with embedded processors,” Proc. the Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 449–453, 2002.
[21] J.-S. Lee, W.-K. Hong, and S.-D. Kim, “Design and evaluation of a selective compressed
memory system,” Proc. Internationl Conference on Computer Design (ICCD), pp. 184–
191, 1999.
[22] A.-R. Adl-Tabatabai, A. M. Ghuloum, and S. O. Kanaujia, “Compression in cache
design,” Proc. the 21st annual international conference on Supercomputing (ICS), pp.
190–201, 2007.
[23] A. Seznec, “Decoupled sectored caches: conciliating low tag implementation cost,” Proc.
the 21st Annual International Symposium on Computer architecture (ISCA), pp. 384–
393, 1994.
[24] “SPEC2006 benchmarks.” [Online]. Available: http://www.specbench.org/osg/cpu2006/
[25] E. Rotenberg, S. Bennett, and J. E. Smith, “Trace cache: A low latency approach to
high bandwidth instruction fetching,” Proc. the 29th Annual IEEE/ACM International
Symposium on Microarchitecture (MICRO), pp. 24–34, 1996.
[26] T. M. Conte, K. N. Menezes, P. M. Mills, and B. A. Patel, “Optimization of instruction fetch mechanisms for high issue rates,” Proc. the 22nd Annual International Symposium on Computer Architecture (ISCA), pp. 333–344, 2005.
[27] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg,
F. Larsson, A. Moestedt, and B. Werner, “Simics: A full system simulation platform,”
IEEE Computer, pp. 50–58, 2002.
[28] M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill,
and D. Wood, “Multifacets general execution-driven multiprocessor simulator(GEMS)
toolset,” Computer Architecture News, pp. 92–99, 2005.
[29] T. K. Prakash and L. Peng, “Performance characterization of spec cpu2006 benchmarks
on intel core 2 duo processor,” ISAST Transactions on Computers and Software Engineering, pp. 36–41, 2008.
[30] C. Zhang, F. Vahid, and W. Najjar, “A highly configurable cache for low energy embedded systems,” ACM Transactions on Embedded Computing Systems, TECS, pp.
363–387, 2005.
[31] HP Laboratories Palo Alto, “CACTI 6.5.” [Online]. Available: ttp://www.hpl.hp.com/

 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *