帳號:guest(3.15.192.196)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):李昭霖
作者(外文):Lee, Chao-Lin
論文名稱(中文):在通用圖形處理模擬器設計低功率浮點數運算指令流程與實驗
論文名稱(外文):Experiment and Devise the Flow for GPGPU-Sim Simulators with Fixed-Point Instructions
指導教授(中文):李政崑
指導教授(外文):Lee, Jenq-Kuen
口試委員(中文):陳呈瑋
洪明郁
口試委員(外文):Chen, Cheng-Wei
Hung, Ming-Yu
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:105065523
出版年(民國):107
畢業學年度:106
語文別:英文
論文頁數:31
中文關鍵詞:深度學習低功率運算GPGPU模擬器
外文關鍵詞:Deep LearningLow-power numericalGPGPUSimulator
相關次數:
  • 推薦推薦:0
  • 點閱點閱:479
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
近年來,GPGPU-Sim 如今已成為學術研究中重要的模擬工具。它是一個週期精確的模擬器,可以針對目前的圖形處理器進行建模。在機器學習方面,它現已廣泛應用於各種系統,如自動駕駛,移動設備和醫學等。隨著行動裝置的普及,行動裝置供應商對將機器學習或深度學習應用從電腦移植到行動裝置非常感興趣。Google為行動裝置和嵌入式系統開發了Tensorflow Lite 和Android NN API。由於機器學習和深度學習是計算密集的應用,因此能量消耗已成為行動裝置中的首要解決的問題。此外,摩爾定律已經逐漸的趨緩,因此行動裝置與伺服器等計算機的效能在可預見的未來成長相當有限。所以效能和能源消耗是我們應該關注的兩個議題。

在本論文中,我們提出了一種新的資料型別,固定點數,這是一種低功率的資料型別,將可以降低功率並提高機器學習應用的性能。我們在GPGPU-Sim 模擬器中實現了固定點數運算指令,並觀察了功率消耗和效能。在我們的實驗中,通過置換固定點運算指令,所提出的設計可以提高節能效果。與浮點數運算相比,使用固定點運算指令可以有效節省圖形處理器的能量消耗。
GPGPU-Sim nowadays has become an important vehicle for academic architecture research. It is a cycle-accurate simulator, which modeling contemporary graphic processing unit.

In the aspect of machine learning, it has now been widely used in various applications, such as auto-drive, mobile device, and medication, etc.
With the popularity of mobile devices, mobile vendors are interested on porting machine learning or deep learning applications from computer to mobile devices. Google has developed Tensorflow Lite and Android NN API for mobile and embedded devices. Since machine learning and deep learning are very computationally intensive, the energy consumptions have become a serious problem in mobile devices. Moreover, the Moore’s Law cannot last forever, so the performance of the mobile device and computers such as desktop or servers will have limited enhancements in the foreseeing future. Therefore, the performance and the energy consumption are two issues we should really concern.

In this thesis, we proposed a new data type, fixed-point, which is a low power numerical data type, can reduce power consumptions and enhance performance in machine learning applications. We implemented the fixed-point instructions in GPGPU-Sim simulator, and observed the power consumption and performance. Our evaluation demonstrates that with the replacement of fixed-point instructional, the proposed design can have the improvement in energy savings. By comparison to floating-point, the total GPU energy saves 13% with fixed-point instructions.
Abstract ----------------------------------------------- i
Contents ----------------------------------------------- iii
List of Figures ---------------------------------------- v
List of Tables ----------------------------------------- vi
1 Introduction ----------------------------------------- 1
1.1 Introduction --------------------------------------- 1
1.2 Overview of the Thesis ----------------------------- 3
2 Background Architectures ----------------------------- 5
2.1 GPGPU-Sim Simulator -------------------------------- 6
2.2 Fixed-Point Enabled Flow --------------------------- 7
3 Fixed-Point Design and Implementation ---------------- 9
3.1 Fixed-Point Representation and Example ------------- 10
3.2 Functional Model Design ---------------------------- 13
4 Fixed-Point Power Model ------------------------------ 18
5 Experiments ------------------------------------------ 20
5.1 Experimental Methodology - NNEF -------------------- 20
5.2 Experimental Results ------------------------------- 25
6 Conclusion and Future Works -------------------------- 28
6.1 Conclusion ----------------------------------------- 28
6.2 Future Works --------------------------------------- 29
[1] T. M. Aamodt, W. W. Fung, I. Singh, A. El-Sha ey, J. Kwa, T. Hetherington, A. Gubran, A. Boktor, T. Rogers, A. Bakhoda et al., "Gpgpu-sim 3.x manual," 2012.
[2] N. Compute, "Ptx: Parallel thread execution isa version 2.3," Dostopnona: http://developer.download.nvidia.com/compute/cuda, vol. 3, 2010.
[3] J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "Gpuwattch: enabling energy optimizations in gpgpus," in ACM SIGARCH Computer Architecture News, vol. 41, no. 3. ACM, 2013, pp. 487-498.
[4] J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable parallel programming with cuda," in ACM SIGGRAPH 2008 classes. ACM, 2008, p. 16.
[5] J. E. Stone, D. Gohara, and G. Shi, "Opencl: A parallel programming standard for heterogeneous computing systems," Computing in science & engineering, vol. 12, no. 3, pp. 66{73, 2010.
[6] S.-C. Wang, L.-C. Kan, C.-L. Lee, Y.-S. Hwang, and J.-K. Lee, "Architecture and compiler support for gpus using energy-efficient affine register files,"ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 23, no. 2, p. 18, 2017.
[7] Y.-M. Chang, S.-C. Wang, C.-C. Yang, Y.-S. Hwang, and J.-K. Lee, "Enabling pocl-based runtime frameworks on the hsa for opencl 2.0 support," Journal of Systems Architecture, vol. 81, pp. 71{82, 2017.
[8] J.-J. Li, C.-B. Kuan, T.-Y. Wu, and J. K. Lee, "Enabling an opencl compiler for embedded multicore dsp systems," in Parallel Processing Workshops (ICPPW), 2012 41st International Conference on. IEEE, 2012, pp. 545-552.
[9] K.-M. Cheng, C.-Y. Lin, Y.-C. Chen, T.-F. Su, S.-H. Lai, and J.-K. Lee, "Design of vehicle detection methods with opencl programming on multi-core systems," in Embedded Systems for Real-time Multimedia (ESTIMedia), 2013 IEEE 11th Symposium on. IEEE, 2013, pp. 88-95.
[10] C.-C. Yang, S.-C. Wang, M.-Y. Hsu, Y.-M. Chang, Y.-S. Hwang, and J.-K. Lee, "Opencl 2.0 compiler adaptation on llvm for ptx simulators," in Parallel Processing Workshops (ICPPW), 2017 46th International Conference on. IEEE, 2017, pp. 53-58.
[11] J. McFarlane and M. Wong. (2016) Fixed-point real numbers. [Online]. Available: http://johnmcfarlane.github.io/fixed-point/papers/p0037r3.html
(此全文未開放授權)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *