帳號:guest(3.133.136.137)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):謝秉勳
作者(外文):Hsieh, Ping-Hsun
論文名稱(中文):針對階層是暫存器圖形處理器之節能暫存器配置方法
論文名稱(外文):Power-Saving Register Allocation Scheme For Hierarchical Register Files On GPU
指導教授(中文):李政崑
指導教授(外文):Lee, Jenq-Kuen
口試委員(中文):陳呈瑋
關啟邦
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:101062555
出版年(民國):103
畢業學年度:102
語文別:英文中文
論文頁數:40
中文關鍵詞:異質多核心底層虛擬機器階層式暫存器暫存器配置方法多核心模擬器能量
相關次數:
  • 推薦推薦:0
  • 點閱點閱:447
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
近年來,圖形處理器(GPU)儼然成為不可或缺之運算裝置,其原因在於利用圖形處理器中巨量的執行緒以及單指令多執行緒的架構而產生強大的運算效能。然而,當運算裝置的運算能力越趨強大,隨之而來的則是越多的能量會在運算期間產生。

在本篇論文中,為了減少通用圖形處理器(GPGPU)程式運行於圖形處理器時所消耗的能量,我們研究新興的階層是圖形處理器架構,並且提出了兩種節能暫存器配置方法,使得程式變數可以被置放於適合之階層暫從器中,並且將此兩種方法整合至現今主流編譯器底層虛擬機器(LLVM)中。

此外,為了減少更多的能量消耗以及提升省電之階層暫從器之使用率,本篇論文亦提出考量使用密度之變數切割機制,選擇使用密度較高卻因限制無法放入省電階層暫從器之變數進行切割,將使用密度交高的區段取出,進行暫從器配置。

實驗結果顯示了透過階層式節能暫存器配置方法以及使用密度之變數切割機制後,各層的暫從器使用比率以及能量消耗。總體來說,能量消耗減少了51.07\%,而節能階層暫存器使用率提升17\%。
Nowadays, Graphic Processing Unit (GPU) is becoming a significant computing device due to powerful computing ability which consists of enormous computing threads and the SIMT architecture. However, the more powerful computing ability comes with more power consumption.

In this thesis, to reduce energy consumption while GPGPU program executes on GPU, we study in new GPU architecture with hierarchical register files and propose two kinds of power-saving register allocation schemes to allocate values to appropriate register files in LLVM (Low-Level Virtual Machine) compilation framework.

Furthermore, in order to gain more energy saving and enhance utilization ratio of energy-efficient register files, we also present our density-aware live range splitting mechanism with cost model into our hierarchical register allocation schemes. We choose live ranges with high energy saving but fail to assign register to do live range splitting for full utilizing energy-efficient register files.

The experiment result shows register accessing ratio of each register file read/write access between our hierarchical register allocation schemes. The energy consumption decrease at most 51.07\%, and improve about 17\% of energy-efficient registers utilization by single-pass register allocation scheme with density-aware live range splitting mechanism.
Abstract i
Contents iii
List of Figures v
List of Tables vi
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . 2
2 The RA for hierarchical register files 5
2.1 Overview Microarchitecture Of HRF . . . . . . . . . . . . . . 5
2.2 Overview Of LLVM Greedy Register Allocation . . . . . . . . 6
2.3 The Implementation of Hierarchical Register Allocation . . . . 11
2.4 The Multi-Pass Register Allocation Scheme . . . . . . . . . . 17
2.5 The Single-Pass Register Allocation Scheme . . . . . . . . . . 19
3 Density-aware live range splitting Mechanism 23
3.0.1 The Cost Model . . . . . . . . . . . . . . . . . . . . . . 24
3.0.2 The Split Point . . . . . . . . . . . . . . . . . . . . . . 26
4 Experiment Results 29
4.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Experimental Environment . . . . . . . . . . . . . . . . . . . . 30
4.3 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . 31
5 Conclusion 37
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
[1]P. Briggs, K. D. Cooper, K. Kennedy, and L. Torczon. Coloring heuris-
tics for register allocation. SIGPLAN Not., 39(4):283–294, Apr. 2004.
[2] G. Chaitin. Register allocation and spilling via graph coloring. SIG-
PLAN Not., 39(4):66–74, Apr. 2004.
[3] S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and
K. Skadron. Rodinia: A benchmark suite for heterogeneous computing.
In Workload Characterization, 2009. IISWC 2009. IEEE International
Symposium on, pages 44–54, Oct 2009.
[4] G. F. Diamos, A. R. Kerr, S. Yalamanchili, and N. Clark. Ocelot: A
dynamic optimization framework for bulk-synchronous applications in
heterogeneous systems. In Proceedings of the 19th International Confer-
ence on Parallel Architectures and Compilation Techniques, PACT ’10,
pages 353–364, New York, NY, USA, 2010. ACM.
[5] M. Gebhart, D. R. Johnson, D. Tarjan, S. W. Keckler, W. J. Dally,
E. Lindholm, and K. Skadron. Energy-efficient mechanisms for manag-
ing thread context in throughput processors. SIGARCH Comput. Archit.
News, 39(3):235–246, June 2011.
[6] M. Gebhart, S. W. Keckler, and W. J. Dally. A compile-time managed
multi-level register file hierarchy. In Proceedings of the 44th Annual
IEEE/ACM International Symposium on Microarchitecture.
[7] S. Hong and H. Kim. An integrated gpu power and performance model.
In Proceedings of the 37th Annual International Symposium on Com-
puter Architecture.
[8] KHRONOS. Opencl: Open computing language version 1.2. https:
//www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml, 2014.
[9] NVIDIA. Cuda: Compute united device architectureprogramming guide
version 2.0. http://developer.download.nvidia.com/compute/
cuda/2_0/docs/NVIDIA_CUDA_Programming_Guide_2.0.pdf, 2008.
[10] NVIDIA. Nvidia fermi compute architecture whitepaper.
http://www.nvidia.com.tw/content/PDF/fermi_white_papers/
NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf, 2009.
[11] NVIDIA. Nvidia kepler architecture whitepaper.
http://www.nvidia.com.tw/content/apac/pdf/tesla/
nvidia-kepler-gk110-architecture-whitepaper-tw.pdf, 2012.
[12] NVIDIA. Ptx: Parallel thread execution isa version 2.3, 2014.
[13] M. Poletto and V. Sarkar. Linear scan register allocation. ACM Trans.
Program. Lang. Syst., 21(5):895–913, Sept. 1999.
(此全文未開放授權)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *