帳號:guest(3.17.76.126)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):柯安琪
作者(外文):Ko, An Chi
論文名稱(中文):利用混合 CPU 和 GPU 的核心軌跡使 NOC 軌跡模擬器實現異質多核心架構
論文名稱(外文):Mixing CPU and GPU Kernel Trace to NOC Trace-Driven Simulator for Heterogeneous Architecture
指導教授(中文):金仲達
指導教授(外文):King, Chung Ta
口試委員(中文):黃稚存
李哲榮
口試委員(外文):Huang, Chih Tsun
Lee, Che Rung
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:102062561
出版年(民國):104
畢業學年度:104
語文別:英文
論文頁數:22
中文關鍵詞:異質多核心架構模擬器軌跡NOC軌跡模擬器
外文關鍵詞:HSAHeterogeneousNOCTrace-Driven SimulatorTrace
相關次數:
  • 推薦推薦:0
  • 點閱點閱:200
  • 評分評分:*****
  • 下載下載:5
  • 收藏收藏:0
在過去幾年裡處理器運算的主流為集結 CPU 和 GPU 的異質多核心架構,異質多核心架 構提供了良好的計算效能和靈活性。能同時使用 CPU 和 GPU 來做運算是表現異質多核心架 構良好效能的最基本方式,但是現在許多模擬異構架構的實驗都只研究 CPU host 端執行程式 呼叫 GPU 去做運算,這樣的方式並不能完整的顯示出異質多核心架構的優點和效能。因此我 們提出了一個混合 CPU 和 GPU 核心軌跡的方法來使 NOC 軌跡模擬器實現同時使用 CPU 和 GPU 做運算的異質多核心構架構。在這篇論文中我們的方法分成了三個部分:產生軌跡部分、 軌跡驅動器的模擬部分以及驗證有效性的部分。最後我們的實驗結果表示混合的軌跡可以得 到差不多的平均網路延遲時間,實驗結果也得到了 1.21%的平均誤差。
Over the last few years, heterogeneous architecture that integrates CPU and GPU together has become popular. It provides good computational performance, flexibility, and power efficiency. An ideal heterogeneous architecture should be able to execute CPU devices and GPU devices at the same time and allocate the computations freely between CPU and GPU. Unfortunately, existing simulators for heterogeneous architecture can only run either CPU host program with GPU kernels or CPU kernels, but not both. This hinders the research on heterogeneous architecture designs. To overcome this limitation, we propose in this thesis an interim solution that mixes CPU and GPU kernel traces to facilitate heterogeneous architecture research that uses trace-driven simulations, such as Network-on-Chip (NoC). To evaluate the feasibility of the proposed method, we compare the performance data reported by a NoC simulator, Garnet, using the mixed traces with those using a reference trace, both are taken from the heterogeneous architecture simulator, Multi2Sim. Our experimental results show that mixing traces can generate an average network latency very similar to the reference trace, with an average error of 1.21%.
1. Introduction 1
2. Related Works 4
3. Methodology 5
3.1. Mixing Trace Method 5
3.2. Reference Trace 8
3.3. Trace Generation Phase 10
3.4. Trace-Driven Simulation Phase 13
3.5. Verification Availability Phase 16
4. Experiments 18
4.1. Experimental Settings 18
4.2. Experimental Results 19
5. Conclusion 22
[1] HSA Foundation, http://www.hsafoundation.com/
[2] Khronos. The OpenCL Specification. https://www.khronos.org/opencl/
[3] Agarwal, N., Krishna, T., Peh, L.-S., and Jha, N. K. “GARNET: A detailed on-chip network model inside a full-system simulator.” In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Apr. 2009, pp. 33–42.
[4] Macsim, https://code.google.com/p/macsim/
[5] V. Zakharenko, T. Aamodt, and A. Moshovos. “Characterizing the performance benefits of fused CPU/GPU systems using fusionism.” In Proceedings of DATE, 2013.
[6] R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. “Multi2sim: A simulation framework for CPU-GPU computing.” In Proceedings of PACT, 2012.
[7] NVIDIA, “NVIDIA CUDA C Programming Guide Ver. 4.0,” 2011.
[8] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. “The gem5 simulator.” SIGARCH Computer Architecture News, vol. 39, no. 2, pp. 1-7, 2011.
[9] Multi2Sim OpenCL SDK. https://www.multi2sim.org/benchmarks/amdapp-2.5.php
[10] P. Rogers. “Heterogeneous system architecture overview.” In Proceedings of Hot Chips 25, 2013.
[11] S. Che, M. Boyer, J. Meng, et al. “A performance study of general purpose applications on graphics processors using CUDA.” Journal of Parallel and Distributed Computing, 2008.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *