帳號:guest(18.118.140.79)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳衍昊
作者(外文):Chen, Yen-Hao
論文名稱(中文):快取系統之可靠性、性能與穩定性研究
論文名稱(外文):Research on Cache Reliability, Performance, and Stability
指導教授(中文):黃婷婷
指導教授(外文):Hwang, Ting-Ting
口試委員(中文):金仲達
王廷基
吳中浩
陳添福
劉一宇
口試委員(外文):King, Chung-Ta
Wang, Ting-Chi
Wu, Allen
Chen, Tien-Fu
Liu, Yi-Yu
學位類別:博士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:103062806
出版年(民國):110
畢業學年度:109
語文別:英文
論文頁數:117
中文關鍵詞:快取記憶體
外文關鍵詞:Cache
相關次數:
  • 推薦推薦:0
  • 點閱點閱:220
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
現今電腦晶片系統當中,快取記憶體已經成為了必要的元件之一。在使用快取記憶體之前設計者必須先行考量快取機制的可靠度、效能、以及穩定度,以此評估並且採用不同快取記憶體之設置。在這個博士論文中,我們將會探討三個快取系統重要的主題,分別為可靠度、效能、以及穩定度。首先我們提出使用7T/14T的快取單元,並且發展一個有效的即時控制機制使得整個系統維持在一個可靠的狀態,實驗的結果顯示,我們的方法可以千倍幅度的降低錯誤發生頻率,並且我們所提出的機制,並不會造成效能或是能源的損失;第二個部分,我們提出針對具有不同存取時間的快取記憶體架構中,存在有些存取模式的效能降低情況,這些存取模式存在於許多重要的應用軟體當中,例如最近熱門的人工智慧應用,我們針對這些重要的存取模式,提出有效的模式偵測機制以及快取控制機制,實驗顯示我們的偵測機制,不只是硬體成本非常的小,並且有效的偵測出目標存取模式,而所提出的快取控制機制則可以緊接在偵測機制之後,有效的避免效能降低,以達到高效能目的;最後我們將會針對快取控制機制進行多方面分析考量,以達到選擇最佳快取機制的目的,過去研究主要針對的是平均命中率進行快取的評估,但是我們注意到穩定度也是一個重的考量,我們提出一個穩定度評估衡量標準,並且配合其他的評估機制進行完整的快取控制機制評估,經過完整並且通盤的分析實驗,實驗分析結果顯示隨機存取機制是最適合給通用型處理器的快取機制。
A cache system is essential for high-performance computing in microprocessors. In this dissertation, we are going to target three indices of the cache system, i.e. cache reliability, cache performance, and cache stability. The scope covers major components in microprocessor cache systems including cache cell, cache architecture, and cache controlling mechanism.

First, a novel cache-utilization-based dynamic voltage-frequency scaling mechanism for reliability enhancements is proposed. We propose a cache architecture using a 7T/14T static random-access memory (SRAM) [1] and a control mechanism for reliability enhancements. Our control mechanism differs from conventional dynamic voltage-frequency scaling (DVFS) methods in that it considers not only the cycles per instruction (CPI) behaviors but also the cache utilization. To measure cache utilization, a novel metric is proposed. The experimental results show that our proposed method achieves one thousand times less bit-error occurrences compared to conventional DVFS methods under the ultra-low voltage operation. Moreover, the results indicate that our proposed method surprisingly not only incurs no performance and energy overheads but also achieves on average a 2.10% performance improvement and a 6.66% energy reduction compared to conventional DVFS methods.

Second, a dynamic link-latency aware replacement policy (DLRP) is developed. Multiprocessor system-on-chips (MPSoCs) in modern devices have mostly adopted the non-uniform cache architecture (NUCA) [3], which features varied physical distance from cores to data locations and, as a result, varied access latency. In the past, researchers focused on minimizing the average access latency of the NUCA. We found that dynamic latency is also a critical index of the performance. A cache access pattern with long dynamic latency will result in a significant cache performance degradation without considering dynamic latency. We have also observed that a set of commonly used neural network application kernels, including the neural network fully-connected and convolutional layers, contains substantial accessing patterns with long dynamic latency. This dissertation proposes a hardware-friendly dynamic latency identification mechanism to detect such patterns and a dynamic link-latency aware replacement policy (DLRP) to improve cache performance based on the NUCA. The proposed DLRP, on average, outperforms the least recently used (LRU) policy by 53% with little hardware overhead. Moreover, on average, our method achieves 45% and 24% more performance improvement than the not recently used (NRU) policy and the static re-reference interval prediction (SRRIP) policy normalized to LRU.

Finally, stability analyses of cache replacement policies for processor designs have been evaluated. A cache system with an effective cache replacement policy is essential for high-performance (micro)-processor designs. Over the years, many cache replacement algorithms have been proposed based on heuristic observations. Those algorithms are usually very effective targeted to a specific application, e.g., LRU-friendly applications, streaming applications, etc. When developing a (micro)-processor, designers first face the decision to select a best-fit cache replacement algorithm for its implementation. If this (micro)-processor is targeting a specific application, an application-specific instruction processor (ASIP) [4] can be developed by using an application-specific cache replacement algorithm to achieve the best overall area/power/speed performance. On the other hand, if this (micro)-processor is a general-purpose (micro)-processor, designers need to select an effective cache replacement algorithm/method that is capable of handling various applications with different characteristics (mixed-workloads). Traditionally, the average hit rate is the main performance index used to estimate the performance of cache replacement policies, and thus most cache replacement policies are focusing on improving the hit rate. However, we have observed that when handling the mixed-workload applications, the average hit rate does not reect the performance variance among different types of applications. In this dissertation, we propose a new performance variance index that can estimate the stability of cache replacement policies. We also found that the random policy has achieved very competitive and stable results compared to other policies. The experimental results have demonstrated that the random policy has achieved 0.08% cache performance variation while the LRU and SRRIP have obtained up to 0.16% and 0.54% variations on the SPEC CPU2006 [5] and GAP [6] benchmark suites. We have also demonstrated that the random policy achieves the most stable overall performance compared to the previous policies under mixed workloads. Moreover, using the random policy, the hardware cost, as well as the power consumption, are significantly lower compared to the previous policies. Consequently, the random policy is a good choice for general-purpose (micro)-processor designs targeted to mixed-workloads of various applications, and thus widely adopted by many of today's (micro)-processor designs.

1 Introduction 1
1.1 Scope and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Overview of Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Previous Work 5
2.1 Reliability of Static Random-access Memory (SRAM) Cells . . . . . . . . . . 5
2.2 Performance on Non-uniform Cache Architectures (NUCAs) . . . . . . . . . 6
2.3 Stability of Cache Replacement Policies . . . . . . . . . . . . . . . . . . . . . 8
3 A Novel Cache-utilization-based Dynamic Voltage-frequency Scaling Mech-
anism for Reliability Enhancement 11
3.1 Dynamic Voltage-frequency Scaling (DVFS) Mechanisms . . . . . . . . . . . 13
3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.1 The 7T/14T SRAMs . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.2 System Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Cache Utility-based Voltage-frequency Scaling Mechanism . . . . . . . . . . 24
3.4.1 Proposed Control Mechanism . . . . . . . . . . . . . . . . . . . . . . 24
3.4.2 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.3 Tag Copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.4 Overhead Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Architecture Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.1 Selection of MPKI Threshold in Control Mechanism . . . . . . . . . . 38
3.5.2 Reliability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5.3 Compared with Conventional 6T SRAM Cache on Performance and
Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5.4 Compared with Other Reliable Caches . . . . . . . . . . . . . . . . . 45
4 A Dynamic Link-latency Aware Cache Replacement Policy(DLRP) 48
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Distance-sensitive Accessing Pattern . . . . . . . . . . . . . . . . . . . . . . 54
4.2.1 Long Dynamic Latency . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.2 Cache System Examples . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.3 Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.4 Identification Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Cache Replacement Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.1 Review of the Replacement Policies . . . . . . . . . . . . . . . . . . . 60
4.3.2 Link-latency Aware Replacement Policy . . . . . . . . . . . . . . . . 61
4.3.3 Hardware Overheads . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5 Randomness: the Most Effective and Efficient Cache-placement Policy for
Processor Designs Targeted to Mixed Applications 72
5.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2 Stability Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 Performance Comparisons of Various Policies . . . . . . . . . . . . . . . . . . 77
5.3.1 Compared with Heuristic Policies . . . . . . . . . . . . . . . . . . . . 78
5.3.2 Policy Specialties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3.3 Mixed Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.4 Compared with Hybrid and Learning-based Policies . . . . . . . . . . 86
5.4 Hardware Design Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.4.1 Area Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4.2 Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4.3 Timing Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.5 Stable Benchmark Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.6 Cache Hierarchy Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.7 Multi-application Environments . . . . . . . . . . . . . . . . . . . . . . . . . 96
6 Conclusions and Future Work 98
[1] Hidehiro Fujiwara, Shunsuke Okumura, Yusuke Iguchi, Hiroki Noguchi, Hiroshi Kawaguchi, and Masahiko Yoshimoto. A dependable sram with 7t/14t memory cells. In ieice transactions on electronics, pages 423{432, 2009.
[2] A. Jain and C. Lin. Back to the future: Leveraging belady's algorithm for improved cache replacement. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pages 78{89, 2016.
[3] Changkyu Kim, Doug Burger, and Stephen W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In Kourosh Gharachorloo, editor, Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), San Jose, California, USA, October 5-9, 2002., pages 211{222. ACM Press, 2002.
[4] D. Liu. Asip (application specific instruction-set processors) design. In 2009 IEEE 8th International Conference on ASIC, pages 16{16, 2009.
[5] John L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34(4):1{17, September 2006.
[6] Scott Beamer, Krste Asanovic, and David A. Patterson. The GAP benchmark suite. CoRR, abs/1508.03619, 2015.
[7] Bibiche Geuskens and Kenneth Rose. Modeling microprocessor performance. Springer Science & Business Media, 2012.
[8] Naveen Verma and Anantha P. Chandrakasan. A 256 kb 65 nm 8t subthreshold sram employing sense-amplifier redundancy. In IEEE Journal of Solid-State Circuits, pages 141{149, 2008.
[9] Ming-Hsien Tu, Jihi-Yu Lin, Ming-Chien Tsai, Chien-Yu Lu, Yuh-Jiun Lin, Meng-Hsueh Wang, Huan-Shun Huang, Kuen-Di Lee, Wei-Chiang (Willis) Shih, Shyh-Jye Jou, and Ching-Te Chuang. A single-ended disturb-free 9t subthreshold sram with cross-point data-aware write word-line structure, negative bit-line, and adaptive read operation timing tracing. In IEEE Journal of Solid-State Circuits, pages 1469{1482, 2012.
[10] Ik Joon Chang, Jae-Joon Kim, Sang Phill Park, and Kaushik Roy. A 32kb 10t sub-threshold SRAM array with bit-interleaving and differential read scheme in 90nm CMOS. In 2008 IEEE International Solid-State Circuits Conference, ISSCC 2008, Digest of Technical Papers, San Francisco, CA, USA, February 3-7, 2008, pages 388-389. IEEE, 2008.
[11] Yi-Wei Chiu, Yu-Hao Hu, Ming-Hsien Tu, Jun-Kai Zhao, Yuan-Hua Chu, Shyh-Jye Jou, and Ching-Te Chuang. 40 nm bit-interleaving 12t subthreshold SRAM with data-aware write-assist. IEEE Trans. on Circuits and Systems, 61-I(9):2578{2585, 2014.
[12] C.D. Moore, S.J. Keller, and A.J. Martin. Ultra-low-power variation-tolerant radiation-hardened cache design, December 10 2013. US Patent 8,605,516.
[13] John L Hennessy and David A Patterson. Computer architecture: a quantitative approach. Elsevier, 2017.
[14] J. Lira, C. Molina, and A. Gonz´lez. Hk-nuca: Boosting data searches in dynamic non-uniform cache architectures for chip multiprocessors. In 2011 IEEE International Parallel Distributed Processing Symposium, pages 419{430, May 2011.
[15] Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. Whirlpool: Improving dynamic cache management with static data classification. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '16, page 113{127, New York, NY, USA, 2016. Association for Computing Machinery.
[16] N. Beckmann, P. Tsai, and D. Sanchez. Scaling distributed cache hierarchies through computation and data co-scheduling. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pages 538{550, Feb 2015.
[17] Ashok B. Mehta. SystemVerilog Assertions and Functional Coverage: Guide to Language, Methodology and Applications. Springer Publishing Company, Incorporated, 2nd edition, 2016.
[18] Kevin Reick, Pia N. Sanda, Scott B. Swaney, Jeffrey W. Kellington, Michael J. Mack, Michael S. Floyd, and Daniel Henderson. Fault-tolerant design of the IBM power6 microprocessor. IEEE Micro, 28(2):30{38, 2008.
[19] Moinuddin K. Qureshi and Zeshan Chishti. Operating secded-based caches at ultra-low voltage with FLAIR. In 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Budapest, Hungary, June 24-27, 2013, pages 1{11. IEEE Computer Society, 2013.
[20] Zeshan Chishti, Alaa R. Alameldeen, Chris Wilkerson, Wei Wu, and Shih-Lien Lu. Improving cache lifetime reliability at ultra-low voltages. In David H. Albonesi, Margaret Martonosi, David I. August, and Jose F. Martinez, editors, 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), December 12-16, 2009, New York, New York, USA, pages 89{99. ACM, 2009.
[21] Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu, Chris Wilkerson, and Shih-Lien Lu. Energy-efficient cache design using variable-strength error-correcting codes. In Ravi Iyer, Qing Yang, and Antonio Gonzalez, editors, 38th International Symposium on Computer Architecture (ISCA 2011), June 4-8, 2011, San Jose, CA, USA, pages 461{472. ACM, 2011.
[22] Meilin Zhang, Vladimir Stojanovic, and Paul Ampadu. Reliable ultra-low-voltage cache design for many-core systems. IEEE Trans. on Circuits and Systems, 59-II(12):858{862, 2012.
[23] Arup Chakraborty, Houman Homayoun, Amin Khajeh, Nikil Dutt, Ahmed M. Eltawil, and Fadi J. Kurdahi. E < MC2: less energy through multi-copy cache. In Vinod Kathail, Reid Tatge, and Rajeev Barua, editors, Proceedings of the 2010 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES 2010, Scottsdale, AZ, USA, October 24-29, 2010, pages 237{246. ACM, 2010.
[24] Gulay Yalcin, Azam Seyedi, Osman S. Unsal, and Adrian Cristal. Flexicache: Highly reliable and low power cache under supply voltage scaling. In Gonzalo Hernandez, Carlos Jaime Barrios Hernandez, Gilberto Diaz, Carlos Garcia Garino, Sergio Nes-machnow, Tomas Perez-Acle, Mario A. Storti, and Mariano Vazquez, editors, High Performance Computing - First HPCLATAM - CLCAR Latin American Joint Conference, CARLA 2014, Valparaiso, Chile, October 20-22, 2014. Proceedings, volume 485 of Communications in Computer and Information Science, pages 173{190. Springer, 2014.
[25] Jaume Abella, Javier Carretero, Pedro Chaparro, Xavier Vera, and Antonio Gonzalez. Low vccmin fault-tolerant cache with highly predictable performance. In David H. Albonesi, Margaret Martonosi, David I. August, and Jose F. Martinez, editors, 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), December 12-16, 2009, New York, New York, USA, pages 111{121. ACM, 2009.
[26] Amin Ansari, Shuguang Feng, Shantanu Gupta, and Scott Mahlke. Archipelago: A polymorphic cache design for enabling robust near-threshold operation. In International Symposium on High Performance Computer Architecture, pages 539{550, 2011.
[27] Tayyeb Mahmood, Soontae Kim, and Seokin Hong. Macho: a failure model-oriented adaptive cache architecture to enable near-threshold voltage scaling. In International Symposium on High Performance Computer Architecture, pages 532{541, 2013.
[28] Avesta Sasan, Houman Homayoun, Ahmed M. Eltawil, and Fadi Kurdahi. Inquisitive defect cache: A means of combating manufacturing induced process variation. In IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pages 1597{1609, 2011.
[29] Farrukh Hijaz, Qingchuan Shi, and Omer Khan. A private level-1 cache architecture to exploit the latency and capacity tradeoffs in multicores operating at near-threshold voltages. In 2013 IEEE 31st International Conference on Computer Design, ICCD 2013, Asheville, NC, USA, October 6-9, 2013, pages 85{92. IEEE Computer Society, 2013.
[30] Chun Liu, A. Sivasubramaniam, M. Kandemir, and M. J. Irwin. Enhancing l2 organization for cmps with a center cell. In Proceedings 20th IEEE International Parallel Distributed Processing Symposium, pages 10 pp.{, April 2006.
[31] Z. Guz, I. Keidar, A. Kolodny, and U. Weiser. Nahalal: Cache organization for chip multiprocessors. IEEE Computer Architecture Letters, 6(1):21{24, Jan 2007.
[32] P. Tsai, N. Beckmann, and D. Sanchez. Jenga: Software-defined cache hierarchies. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 652{665, June 2017.
[33] Z. Radovic and E. Hagersten. Efficient synchronization for nonuniform communication architectures. In SC '02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, pages 13{13, Nov 2002.
[34] Liqun Cheng, N. Muralimanohar, K. Ramani, R. Balasubramonian, and J. B. Carter. Interconnect-aware coherence protocols for chip multiprocessors. In 33rd International Symposium on Computer Architecture (ISCA'06), pages 339{351, June 2006.
[35] E. Bolotin, Z. Guz, I. Cidon, R. Ginosar, and A. Kolodny. The power of priority: Noc based distributed cache coherency. In First International Symposium on Networks-on-Chip (NOCS'07), pages 117{126, May 2007.
[36] D. S. Gracia, G. Dimitrakopoulos, T. M. Arnal, M. G. H. Katevenis, and V. V. Yufera. Lp-nuca: Networks-in-cache for high-performance low-power embedded processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 20(8):1510{1523, Aug 2012.
[37] S. K. Sadasivam, B. W. Thompto, R. Kalla, and W. J. Starke. Ibm power9 processor architecture. IEEE Micro, 37(2):40{51, Mar 2017.
[38] Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Distance associativity for high-performance energy-efficient non-uniform cache architectures. In Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36., pages 55{66, Dec 2003.
[39] J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. A nuca substrate for exible cmp cache sharing. IEEE Transactions on Parallel and Distributed Systems, 18(8):1028{1040, Aug 2007.
[40] Seongbeom Kim, Dhruba Chandra, and Yan Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT '04, page 111{122, USA, 2004. IEEE Computer Society.
[41] Thomas Y. Yeh and Glenn Reinman. Fast and fair: data-stream quality of service. In Thomas M. Conte, Paolo Faraboschi, William H. Mangione-Smith, and Walid A. Najjar, editors, Proceedings of the 2005 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES 2005, San Francisco, California, USA, September 24-27, 2005, pages 237{248. ACM, 2005.
[42] Javier Lira, Carlos Molina, and Antonio Gonzalez. Lru-pea: A smart replacement policy for non-uniform cache architectures on chip multiprocessors. 2009 IEEE International Conference on Computer Design, pages 275{281, 2009.
[43] P. Tsai, N. Beckmann, and D. Sanchez. Nexus: A new approach to replication in distributed shared caches. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 166{179, Sep. 2017.
[44] Qianqian Wu and Zhenzhou Ji. A reuse-degree based locality classifier for locality-aware data replication. IEEE Access, PP:1{1, 12 2019.
[45] J. Merino, V. Puente, and J. A. Gregorio. Esp-nuca: A low-cost adaptive non-uniform cache architecture. In HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture, pages 1{10, Jan 2010.
[46] S. Akioka, F. Li, K. Malkowski, P. Raghavan, M. Kandemir, and M. J. Irwin. Ring data location prediction scheme for non-uniform cache architectures. In 2008 IEEE International Conference on Computer Design, pages 693{698, Oct 2008.
[47] S. Das and H. K. Kapoor. Towards a better cache utilization by selective data storage for cmp last level caches. In 2016 29th International Conference on VLSI Design and 2016 15th International Conference on Embedded Systems (VLSID), pages 92{97, Jan 2016.
[48] S. Chen, L. Huang, and S. Li. An address remapping algorithm to reduce power consumption in noc-based chip-multiprocessors. In 2016 International SoC Design Conference (ISOCC), pages 209{210, Oct 2016.
[49] K. Shyam and R. Govindarajan. An array allocation scheme for energy reduction in partitioned memory architectures. In Shriram Krishnamurthi and Martin Odersky, editors, Compiler Construction, pages 32{47, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
[50] Shounak Chakraborty and Hemangee K. Kapoor. Performance linked dynamic cache tuning: A static energy reduction approach in tiled cmps. Microprocessors and Microsystems, 52:221 { 235, 2017.
[51] Y. Wang, L. Zhang, Y. Han, H. Li, and X. Li. Data remapping for static nuca in degradable chip multiprocessors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 23(5):879{892, May 2015.
[52] A. Tretter, P. Kumar, and L. Thiele. Interleaved multi-bank scratchpad memories: A probabilistic description of access con icts. In 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1{6, June 2015.
[53] Y. Ding and W. Zhang. Wcet analysis of static nuca caches. In 2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC), pages 1{6, Dec 2014.
[54] B. Ramakrishna Rau. Pseudo-randomly interleaved memory. In IN PROCEEDINGS OF THE 18TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, pages 74{83, 1991.
[55] Andre Seznec. Bank-interleaved cache or memory indexing does not require euclidean division. In 11th Annual Workshop on Duplicating, Deconstructing and Debunking, Portland, United States, June 2015.
[56] Christophe Giacomotto, Mandeep Singh, Milena Vratonjic, and Vojin G. Oklobdz-ija. Energy efficiency of power-gating in low-power clocked storage elements. In 18th International Workshop on Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation - Volume 5349, PATMOS 2008, pages 268{276, Berlin, Heidelberg, 2009. Springer-Verlag.
[57] Orhan Kislal, Jagadish Kotra, Xulong Tang, Mahmut Taylan Kandemir, and Myoung-soo Jung. Enhancing computation-to-core assignment with physical location information. SIGPLAN Not., 53(4):312{327, June 2018.
[58] S. Cho and L. Jin. Managing distributed, shared l2 caches through os-level page allocation. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06), pages 455{468, Dec 2006.
[59] Aamer Jaleel, Kevin B. Theobald, Simon C. Steely Jr., and Joel S. Emer. High performance cache replacement using re-reference interval prediction (RRIP). In Andre Seznec, Uri C. Weiser, and Ronny Ronen, editors, 37th International Symposium on Computer Architecture (ISCA 2010), June 19-23, 2010, Saint-Malo, France, pages 60{71. ACM, 2010.
[60] L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2):78{101, 1966.
[61] Elizabeth J. O'Neil, Patrick E. O'Neil, and Gerhard Weikum. An optimality proof of the lru-K page replacement algorithm. J. ACM, 46(1):92{112, 1999.
[62] John L. Henning. SPEC CPU2000: measuring CPU performance in the new millennium. IEEE Computer, 33(7):28{35, 2000.
[63] Hussein Al-Zoubi, Aleksandar Milenkovic, and Milena Milenkovic. Performance evaluation of cache replacement policies for the spec cpu2000 benchmark suite. In Proceedings of the 42Nd Annual Southeast Regional Conference, ACM-SE 42, pages 267{272, New York, NY, USA, 2004. ACM.
[64] Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. Adaptive insertion policies for high performance caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture, ISCA '07, pages 381{391, New York, NY, USA, 2007. ACM.
[65] C. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. C. Steely, and J. Emer. Ship: Signature-based hit predictor for high performance caching. In 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 430{441, Dec 2011.
[66] Zhan Shi, Xiangru Huang, Akanksha Jain, and Calvin Lin. Applying deep learning to the cache replacement problem. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '52, page 413{425, New York, NY, USA, 2019. Association for Computing Machinery.
[67] David Molnar, Matt Piotrowski, David Schultz, and David Wagner. The program counter security model: Automatic detection and removal of controlow side channel attacks. In Dong Ho Won and Seungjoo Kim, editors, Information Security and Cryptology - ICISC 2005, pages 156{168, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
[68] Santosh D. Chede and Kishore D. Kulat. Design overview of processor based implantable pacemaker. JCP, 3(8):49{57, 2008.
[69] Sparsh Mittal. A survey of techniques for improving energy efficiency in embedded computing systems. IJCAET, 6(4):440{459, 2014.
[70] Li Shang, Li-Shiuan Peh, and Niraj K. Jha. Dynamic voltage scaling with links for power optimization of interconnection networks. In The Ninth International Symposium on High-Performance Computer Architecture, pages 91{102, 2003.
[71] Asit K. Mishra, Reetuparna Das, Soumya Eachempati, Ravi Iyer, N. Vijaykrishnan, and Chita R. Das. A case for dynamic frequency tuning in on-chip networks. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pages 292{303, New York, NY, USA, 2009. ACM.
[72] Evert Seevinck, Frans J. List, and Jan Lohstroh. Static-noise margin analysis of mos sram cells. In IEEE Journal of Solid-State Circuits, pages 748{754, 1987.
[73] Grigorios Magklis, Michael L. Scott, Greg Semeraro, David H. Albonesi, and Steven Dropsho. Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor. In Proceedings of the 30th Annual International Symposium on Computer Architecture, ISCA '03, pages 14{27, New York, NY, USA, 2003. ACM.
[74] Canturk Isci, Gilberto Contreras, and Margaret Martonosi. Live, runtime phase monitoring and prediction on real systems with application to dynamic power management. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 39, pages 359{370, Washington, DC, USA, 2006. IEEE Computer Society.
[75] Gaurav Dhiman and Tajana Simunic Rosing. Dynamic voltage frequency scaling for multi-tasking systems using online learning. In Diana Marculescu, Anand Raghunathan, Ali Keshavarzi, and Vijaykrishnan Narayanan, editors, Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007, Portland, OR, USA, August 27-29, 2007, pages 207{212. ACM, 2007.
[76] Xi Chen, Zheng Xu, Hyungjun Kim, Paul Gratz, Jiang Hu, Michael Kishinevsky, and Umit Ogras. In-network monitoring and control policy for dvfs of cmp networks-on-chip and last level caches. ACM Trans. Des. Autom. Electron. Syst., 18(4):47:1{47:21, October 2013.
[77] Christian Poellabauer, Leo Singleton, and Karsten Schwan. Feedback-based dynamic voltage and frequency scaling for memory-bound real-time applications. In Real Time and Embedded Technology and Applications Symposium, pages 234{243, 2005.
[78] Xing Fu, Khairul Kabir, and Xiaorui Wang. Cache-aware utilization control for energy efficiency in multi-core real-time systems. In Karl-Erik Arzen, editor, 23rd Euromicro Conference on Real-Time Systems, ECRTS 2011, Porto, Portugal, 5-8 July, 2011, pages 102{111. IEEE Computer Society, 2011.
[79] Toshikazu Suzuki, Yoshinobu Yamagami, Ichiro Hatanaka, Akinori Shibayama, Hironori Akamatsu, and Hiroyuki Yamauchi. A sub-0.5-v operating embedded sram featuring a multi-bit-error-immune hidden-ecc scheme. In IEEE Journal of Solid-State Circuits, pages 152{160, 2006.
[80] Toshikazu Suzuki, Yoshinobu Yamagami, Ichiro Hatanaka, Akinori Shibayama, Hironori Akamatsu, and Hiroyuki Yamauchi. Fault-containment in cache memories for tmr redundant processor systems. In IEEE Transactions on Computers, pages 386{397, 2002.
[81] Shunsuke Okumura, Shusuke Yoshimoto, Kosuke Yamaguchi, Yohei Nakata, Hiroshi Kawaguchi, and Masahiko Yoshimoto. 7t SRAM enabling low-energy simultaneous block copy. In Jacqueline Snyder, Rakesh Patel, and Tom Andre, editors, IEEE Custom Integrated Circuits Conference, CICC 2010, San Jose, California, USA, 19-22 September, 2010, Proceedings, pages 1{4. IEEE, 2010.
[82] Jinwook Jung, Yohei Nakata, Shunsuke Okumura, Hiroshi Kawaguchi, and Masahiko Yoshimoto. Reconfiguring cache associativity: Adaptive cache design for wide-range reliable low-voltage operation using 7t/14t SRAM. IEICE Transactions, 96-C(4):528{537, 2013.
[83] Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In Avi Mendelson, editor, The 40th Annual International Symposium on Computer Architecture, ISCA'13, Tel-Aviv, Israel, June 23-27, 2013, pages 308{319. ACM, 2013.
[84] Milo M. K. Martin, Daniel J. Sorin, Bradford M. Beckmann, Michael R. Marty, Min Xu, Alaa R. Alameldeen, Kevin E. Moore, Mark D. Hill, and David A. Wood. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. In Computer Architecture News, 2005.
[85] Radu David, Paul Bogdan, and Radu Marculescu. Dynamic power management for multicores: Case study using the intel scc. In International Conference on VLSI and System-on-Chip, pages 147{152, 2012.
[86] Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '43, pages 65{76, Washington, DC, USA, 2010. IEEE Computer
Society.
[87] Andrea Bartolini, Matteo Cacciari, Andrea Tilli, Luca Benini, and Matthias Gries. A virtual platform environment for exploring power, thermal and reliability management control strategies in high-performance multicores. In Proceedings of the 20th Symposium on Great Lakes Symposium on VLSI, GLSVLSI '10, pages 311{316, New York, NY, USA, 2010. ACM.
[88] Yohei Nakata, Shunsuke Okumura, Hiroshi Kawaguchi, and Masahiko Yoshimoto. 0.5-v operation variation-aware word-enhancing cache architecture using 7t/14t hybrid sram. In Proceedings of the 16th ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED '10, pages 219{224, New York, NY, USA, 2010. ACM.
[89] A. Boro, B. Thomas, S. R. Ahamed, and G. Trivedi. Fpga implementation of a dedicated processor for temperature prediction. In 2016 International Conference on Accessibility to Digital World (ICADW), pages 21{26, Dec 2016.
[90] Moustafa Alzantot, Yingnan Wang, Zhengshuang Ren, and Mani B. Srivastava. Rstensor ow: GPU enabled tensor ow for deep learning on commodity android devices. In Proceedings of the 1st International Workshop on Embedded and Mobile Deep Learning (Deep Learning for Mobile Systems and Applications), EMDL@MobiSys 2017, Niagara Falls, NY, USA, June 23, 2017, pages 7{12. ACM, 2017.
[91] H. Ramchoun, M. A. Janati Idrissi, Y. Ghanou, and M. Ettaouil. Multilayer perceptron: Architecture optimization and training with mixed activation functions. In Proceedings of the 2Nd International Conference on Big Data, Cloud and Applications, BDCA'17, pages 71:1{71:6, New York, NY, USA, 2017. ACM.
[92] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278{2324, Nov 1998.
[93] J. R. Diamond, D. S. Fussell, and S. W. Keckler. Arbitrary modulus indexing. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pages 140{152, Dec 2014.
[94] Yaming Yin, Shuming Chen, and Xiao Hu. Input buffer planning for network-on-chip router design. In 2010 International Conference on Computer Application and System Modeling (ICCASM 2010), volume 13, pages V13{201{V13{204, 2010.
[95] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1{7, August 2011.
[96] Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali, and Onur Mutlu. Understanding the interactions of workloads and DRAM types: A comprehensive experimental study. CoRR, abs/1902.07609, 2019.
[97] Bakiri Mohammed, Christophe Guyeux, Jean-Francois Couchot, and Abdelkrim Oud-jida. Survey on hardware implementation of random number generators on fpga: Theory and experimental analyses. Computer Science Review, 27:135{153, 02 2018.
[98] Himanshu Bhatnagar. Advanced ASIC Chip Synthesis: Using Synopsys Design Compiler Physical Compiler and PrimeTime. Kluwer Academic Publishers, Norwell, MA, USA, 2nd edition, 2002.
[99] TSMC. 0.13-micron technology | taiwan semiconductor manufacturing company. https://www.tsmc.com/english/dedicatedFoundry/technology/0.13um. htm, 2019. Accessed: 2019-06-12.
[100] Shrut Patel. Cache-implementation. https://github.com/shrut1996/ Cache-Implementation, 2017.
[101] Millind Mittal. Computer system and method of allocating cache memories in a multilevel cache hierarchy utilizing a locality hint within an instruction, October 27 1998. US Patent 5,829,025.
[102] DavidWonnacott. Achieving scalable locality with time skewing. International Journal of Parallel Programming, 30:2002, 1999.
[103] Wim Heirman, Kristof Du Bois, Yves Vandriessche, Stijn Eyerman, and Ibrahim Hur. Near-side prefetch throttling: Adaptive prefetching for high-performance many-core processors. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, PACT '18, pages 28:1{28:11, New York, NY, USA, 2018. ACM.
[104] ARM. Arm information center. http://infocenter.arm.com/help/index.jsp, 2019. Accessed: 2019-05-23.
[105] Andes. Andes technology corporation. http://www.andestech.com/en/homepage/, 2019. Accessed: 2019-06-13.
[106] RISC-V International Archive. Risc v cores and socs. https://github.com/riscv/ riscv-wiki/wiki/RISC-V-Cores-and-SoCs, 2019. Accessed: 2019-05-23.
[107] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. Meltdown: Reading kernel memory from user space. In 27th USENIX Security Symposium (USENIX Security 18), 2018.
[108] Paul Kocher, Jann Horn, Anders Fogh, , Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. Spectre attacks: Exploiting speculative execution. In 40th IEEE Symposium on Security and Privacy (S&P'19), 2019.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *