在RRAM架構上依據指標形式的OpenCL優化__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.49) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	于琳雅
作者(外文):	Yu, Lin-Ya
論文名稱(中文):	在RRAM架構上依據指標形式的OpenCL優化
論文名稱(外文):	Memory SSA-based Analysis for Pointer-based OpenCL Programs on RRAM
指導教授(中文):	李政崑
指導教授(外文):	Lee, Jenq-Kuen
口試委員(中文):	陳呈瑋蘇泓萌
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系所
學號:	104062634
出版年(民國):	106
畢業學年度:	105
語文別:	英文
論文頁數:	42
中文關鍵詞:	編譯器、指標分析、記憶體靜態單賦值形式、分級的讀寫分析、OpenCL、LLVM
外文關鍵詞:	compiler、pointer analysis、memory static single assignment (SSA)、hierarchical read/write analysis、OpenCL、LLVM
相關次數:	推薦:0 點閱:1363 評分: 下載:18 收藏:0

由於半導體技術在提升處理器速度上面臨了物理的限制及整體環境對計算性能不斷增長的需求，
包含中央處理器和多種硬體加速器等計算資源的異構計算平台逐漸成為現今的趨勢。
學術界中許多關於優化策略的研究應運而生，希望能夠在降低異質系統的能量消耗下獲得更好的資料生產量。
其中，記憶體相關技術，諸如DRAM，STT-RAM和RRAM被發展來幫助實現這個目標。
與此同時，OpenCL提供了在異質架構系統中為多種硬體加速器撰寫的開放程式語言標準。
在本篇論文中，我們提出了一個基於OpenCL的、新的靜態分析技術，可以協助確定程序中物件的讀/寫特性。
我們考量在擁有DRAM和RRAM的計算配置環境下，嘗試解決將哪些變數儲存於RRAM能夠最有利於能效的問題。
我們提出的基於記憶體靜態單賦值形式編譯器層的分析能夠處理有指針的程序。
實驗評估的結果顯示我們所提出的分析平均誤差為0.355，
而從中獲得的讀/寫資訊具有實現高節能的潛力。
此外，與實驗中的基準系統相比較，
讀/寫資訊有助於每個內核達到平均43.61％的節能效果。

Heterogeneous computing platforms containing a wide range of computing resources from CPUs to specialized hardware accelerators is the trend today resulting from the physical limitations on processors speed and the increasing demand for computing performance. Hence many optimization strategies are studied to get better throughput and lower energy consumption in heterogeneous systems. Various memory technologies such as DRAM, STT-RAM, and RRAM are also developed to help reach the goal. Meanwhile OpenCL is an open programming language standard for programmers to write programs on these hardware accelerators in a heterogeneous system.
In this thesis, a new static analysis technique, which is based on OpenCL programming languages, can determine read/write characteristics of instances within the program is presented. We consider the computing configuration with both DRAM and RRAM. We try to answer the question of which variables stored on RRAM will benefit energy efficiency the best. Our compiler scheme based on our enhanced Memory SSA enables the analysis to cover pointer-based programs. Our evaluation demonstrates that the read/write information obtained from the proposed design has great potential to achieve high energy-savings. The average error distance of our proposed scheme is 0.355. In addition, compared to the baseline system, the read/write information help gain average 43.61\% energy-saving per kernel.

Abstract i
Contents iii
List of Figures v
List of Tables vi
1 Introduction 1
1.1 Introduction............................ 1
1.2 Overview of the Thesis...................... 2
2 Background 5
2.1 OpenCL.............................. 5
2.2 RRAM............................... 7
2.3 Static Single Assignment Form.................. 8
2.3.1 Memory SSA ....................... 9
3 Hierarchical Read/Write Analysis 13
3.1 Main Algorithm: Hierarchical Read/Write Analysis . . . . . . 14
3.2 Hierarchical-address-space Sensitive Alias Analysis Algorithm 16
3.3 EvaluateAccessCount Algorithm................. 16
4 Enhanced Memory SSA 21
4.1 Enhanced Memory SSA Implementation . . . . . . . . . . . . 21
4.2 Usage Scenarios.......................... 25
4.2.1 Hierarchical read/write analysis . . . . . . . . . . . . . 25
4.2.2 Divergence analysis ................... 27
5 Experimental Methodology and Results 32
5.1 Experimental Methodology.................... 32
5.2 Experimental Results....................... 33
5.2.1 Accuracy of provided read/write information . . . . . . 33
5.2.2 Data memory energy-saving from provided read/write information ........................ 35
6 Conclusion and Future Works 37
6.1 Conclusion............................. 37
6.2 Future Works ........................... 38

[1] F. C. Chow, S. Chan, S.-M. Liu, R. Lo, and M. Streich, “E↵ective representation of aliases and indirect memory operations in ssa form,” in Proceedings of the 6th International Conference on Compiler Construc- tion,ser.CC’96. London,UK,UK:Springer-Verlag,1996,pp.253–267. [Online]. Available: http://dl.acm.org/citation.cfm?id=647473.760381
[2] D. Novillo, “Memory ssa-a unified approach for sparsely representing memory operations,” in Proc of the GCC Developers Summit. Citeseer, 2007.
[3] CUDA C Programming Guide, NVIDIA, 2016. [Online]. Available: http://docs.nvidia.com/cuda/cuda-c-programming-guide/
[4] The OpenCL Specification, version 1.2, OpenCL Working Group, 2012. [Online]. http://www.khronos.org/registry/cl/spec/opencl-1.2.pdf
[5] The OpenCL Specification, version 2.0, OpenCL Working Group, 2015. [Online]. https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf
Khronos Available:
Khronos Available:
[6] C++ Accelerated Massive Parallelism, Microsoft Corp. [Online]. Avail- able: http://msdn.microsoft.com/en-us/library/vstudio/hh265137.aspx
[7] Compute Shader Overview, Microsoft Corp. [Online]. Available:
http://msdn.microsoft.com/en-us/library/↵476331.aspx
[8] B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, “Architecting phase change memory as a scalable dram alternative,” SIGARCH Comput. Archit. News, vol. 37, no. 3, pp. 2–13, Jun. 2009. [Online]. Available: http://doi.acm.org/10.1145/1555815.1555758
[9] P. Zhou, B. Zhao, J. Yang, and Y. Zhang, “A durable and energy ecient main memory using phase change memory technology,” SIGARCH Comput. Archit. News, vol. 37, no. 3, pp. 14–23, Jun. 2009. [Online]. Available: http://doi.acm.org/10.1145/1555815.1555759
[10] M. K. Qureshi, V. Srinivasan, and J. A. Rivers, “Scalable high perfor- mance main memory system using phase-change memory technology,” in Proceedings of the 36th Annual International Symposium on Computer Architecture, ser. ISCA ’09. New York, NY, USA: ACM, 2009, pp. 24– 33. [Online]. Available: http://doi.acm.org/10.1145/1555754.1555760
[11] L. E. Ramos, E. Gorbatov, and R. Bianchini, “Page placement in hybrid memory systems,” in Proceedings of the International Conference on Supercomputing,ser.ICS’11. NewYork,NY,USA:ACM,2011,pp.85– 95. [Online]. Available: http://doi.acm.org/10.1145/1995896.1995911
[12] D.-J. Shin, S. K. Park, S. M. Kim, and K. H. Park, “Adaptive page grouping for energy eciency in hybrid pram-dram main memory,” in Proceedings of the 2012 ACM Research in Applied Computation Symposium, ser. RACS ’12. New York, NY, USA: ACM, 2012, pp. 395– 402. [Online]. Available: http://doi.acm.org/10.1145/2401603.2401689
[13] A. Hassan, H. Vandierendonck, and D. S. Nikolopoulos, “Energy- ecient in-memory data stores on hybrid memory hierar- chies,” in Proceedings of the 11th International Workshop on Data Management on New Hardware, ser. DaMoN’15. New York, NY, USA: ACM, 2015, pp. 1:1–1:8. [Online]. Available: http://doi.acm.org/10.1145/2771937.2771940
[14] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, “Eciently computing static single assignment form and the control dependence graph,” ACM Trans. Program. Lang. Syst., vol. 13, no. 4, pp. 451–490, Oct. 1991. [Online]. Available: http://doi.acm.org/10.1145/115372.115320
[15] C. Lattner and V. Adve, “LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation,” in Proceedings of the 2004 Inter- national Symposium on Code Generation and Optimization (CGO’04), Palo Alto, California, Mar 2004.
[16] Y. Ho, G. M. Huang, and P. Li, “Nonvolatile memristor memory: Device characteristics and design implications,” in Proceedings of the 2009 International Conference on Computer-Aided Design, ser. ICCAD ’09. New York, NY, USA: ACM, 2009, pp. 485–490. [Online]. Available: http://doi.acm.org/10.1145/1687399.1687491
[17] Open64 compiler, CAPSL. [Online]. Available: http://www.open64.net/
[18] (2016) Accelerated parallel processing (app) sdk. Advanced Micro Devices Inc. [Online]. Available: http://developer.amd.com/tools-and-sdk/heterogeneous-
computing/amd-accelerated-parallel-processing-app-sdk
[19] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Shea↵er, S.-H. Lee, and K. Skadron, “Rodinia: A benchmark suite for heterogeneous computing,” in Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), ser. IISWC ’09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 44–54. [Online]. Available: http://dx.doi.org/10.1109/IISWC.2009.5306797
[20] A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, “Analyzing cuda workloads using a detailed gpu simulator,” 2009 IEEE International Symposium on Performance Analysis of Systems and Soft- ware, pp. 163–174, 2009.
[21] S. Thoziyoor, J. H. Ahn, M. Monchiero, J. B. Brockman, and N. P. Jouppi, “A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies,” in Proceedings of the 35th Annual International Symposium on Computer Architecture, ser. ISCA ’08. Washington, DC, USA: IEEE Computer Society, 2008, pp. 51–62. [Online]. Available: http://dx.doi.org/10.1109/ISCA.2008.16
[22] C. Xu, X. Dong, N. P. Jouppi, and Y. Xie, “Design implications of memristor-basedRRAMcross-pointstructures,”inDATE. IEEE,2011, pp. 734–739.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文