帳號:guest(          離開系統
字體大小: 字級放大   字級縮小   預設字形  


論文名稱(外文):An Effective Early Multi-core System Shared Cache Design Method Based on Reuse-distance Analysis
指導教授(外文):Tsay, Ren-Song
口試委員(外文):Hsu, Yar-Sun
Lu, Shih-Lien
  • 推薦推薦:0
  • 點閱點閱:261
  • 評分評分:*****
  • 下載下載:39
  • 收藏收藏:0
In this paper, we proposed an effective and efficient multi-core shared-cache design optimization approach based on reuse-distance analysis of the data traces of target applications. Since data traces are independent of system hardware architectures, a designer can easily compute the best cache design at early system design phase using our approach. We devise a very efficient and yet accurate method to derive the aggregated reuse-distance histograms of concurrent applications for accurate cache performance analysis and optimization. Essentially, the actual shared-cache contention results of concurrent applications are embedded in the aggregated reuse-distance histograms and therefore the approach is very effective. The experimental results show that the average error rate of shared-cache miss-count estimations of our approach is less than 3.2%. Using a simple scanning search method, one can easily determine the true optimal cache configurations at early system design phase.
Contents----------------------------------------------- 4
List of Tables----------------------------------------- 5
List of Figures---------------------------------------- 6
1. Introduction----------------------------------- 7
2. Related Work----------------------------------- 12
3. Shared Cache Design Optimization--------------- 16
3.1 Aggregated reuse-distance computation---------- 17
3.2 Cache configuration optimization--------------- 21
4. Experiments------------------------------------ 25
4.1 Experimental setup----------------------------- 25
4.2 Results of One-level Shared Cache Designs------ 25
4.3 Results of Two-level Cache Designs------------- 26
4.4 Optimal Cache Designs-------------------------- 28
4.5 Discussions------------------------------------ 29
5. Conclusion------------------------------------- 31
Bibliography------------------------------------------- 32
[1] Basu, Arkaprava, et al. "Scavenger: A new last level cache architecture with global block priority." Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2007. M. K.
[2] Qureshi, Moinuddin K., et al. "Adaptive insertion policies for high performance caching." ACM SIGARCH Computer Architecture News. Vol. 35. No. 2. ACM, 2007.
[3] Jaleel, Aamer, et al. "High performance cache replacement using re-reference interval prediction (RRIP)." ACM SIGARCH Computer Architecture News. Vol. 38. No. 3. ACM, 2010.
[4] Khan, Samira, Yingying Tian, and Daniel Jiménez. "Sampling dead block prediction for last-level caches." Microarchitecture (MICRO), 2010 43rd Annual IEEE/ACM International Symposium on. IEEE, 2010.
[5] Duong, Nam, et al. "Improving cache management policies using dynamic reuse-distances." Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2012.
[6] Qureshi, Moinuddin K., and Yale N. Patt. "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches." Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2006.
[7] Xie, Yuejian, and Gabriel H. Loh. "PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches." ACM SIGARCH Computer Architecture News. Vol. 37. No. 3. ACM, 2009.
[8] Kim, Seongbeom, Dhruba Chandra, and Yan Solihin. "Fair cache sharing and partitioning in a chip multiprocessor architecture." Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 2004.
[9] Chandra, Dhruba, et al. "Predicting inter-thread cache contention on a chip multi-processor architecture." High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on. IEEE, 2005.
[10] Xu, Chi, et al. "Cache contention and application performance prediction for multi-core systems." Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on. IEEE, 2010.
[11] Sandberg, Andreas, David Black-Schaffer, and Erik Hagersten. "Efficient techniques for predicting cache sharing and throughput." Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM, 2012.
[12] Liu, Chun, Anand Sivasubramaniam, and Mahmut Kandemir. "Organizing the last line of defense before hitting the memory wall for CMPs." Software, IEE Proceedings-. IEEE, 2004.
[13] Brock, Jacob, et al. "Optimal cache partition-sharing." Parallel Processing (ICPP), 2015 44th International Conference on. IEEE, 2015.
[14] Chang, Jichuan, and Gurindar S. Sohi. "Cooperative cache partitioning for chip multiprocessors." ACM International Conference on Supercomputing 25th Anniversary Volume. ACM, 2014.
[15] Suh, G. Edward, Larry Rudolph, and Srinivas Devadas. "Dynamic partitioning of shared cache memory." The Journal of Supercomputing 28.1 (2004): 7-26.
[16] Mattson, Richard L., et al. "Evaluation techniques for storage hierarchies." IBM Systems journal 9.2 (1970): 78-117.
[17] Subramanian, Lavanya, et al. "The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory." Proceedings of the 48th International Symposium on Microarchitecture. ACM, 2015.
[18] Eklov, David, David Black-Schaffer, and Erik Hagersten. "Fast modeling of shared caches in multicore systems." Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. ACM, 2011.
[19] Chen, Xi E., and Tor M. Aamodt. "Modeling cache contention and throughput of multiprogrammed manycore processors." Computers, IEEE Transactions on61.7 (2012): 913-927.
[20] Chen, Xi E., and Tor M. Aamodt. "A first-order fine-grained multithreaded throughput model." High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on. IEEE, 2009.
[21] Carlson, Trevor E., Wim Heirman, and Lieven Eeckhout. "Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation." Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 2011.
[22] Henning, John L. "SPEC CPU2006 benchmark descriptions." ACM SIGARCH Computer Architecture News 34.4 (2006): 1-17.
[23] Jaleel, Aamer. "Memory characterization of workloads using instrumentation-driven simulation." Web Copy: http://www. glue. umd. edu/ajaleel/workload(2010).
[24] Cheng-Lin Tsai, et al. "A Fast-and-Effective Early-Stage Multi-level Cache Optimization Method Based on Reuse-Distance Analysis." National Tsing Hua University, 2016.
[25] Jaleel, Aamer, et al. "High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches." High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on. IEEE, 2015.
第一頁 上一頁 下一頁 最後一頁 top
* *