帳號:guest(3.145.12.195)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):呂秉松
作者(外文):Lu, Bing-Sung
論文名稱(中文):OpenCL低功率浮點數功能原型設計
論文名稱(外文):Prototyping OpenCL Reference Designs with Fixed-Point Feature Sets
指導教授(中文):李政崑
指導教授(外文):Lee, Jenq-Kuen
口試委員(中文):陳呈瑋
洪明郁
口試委員(外文):Chen, Cheng-Wei
Hung, Ming-Yu
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系所
學號:104062530
出版年(民國):107
畢業學年度:106
語文別:英文
論文頁數:43
中文關鍵詞:深度學習OpenCLSPIR-V低功率浮點數
外文關鍵詞:deep learningOpenCLSPIR-Vfixed-point
相關次數:
  • 推薦推薦:0
  • 點閱點閱:951
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
隨著深度學習在近年來飛速的發展,我們意識到了此法可以用來解決以往電腦程式所無法解決的問題,而物聯網即是其中受惠的領域之一,然而運行深度學習的程式需要大量的記憶體以及電力,許多物聯網的終端機缺乏這兩種資源,導致深度學習無法大量運用在物聯網之中。由於深度學習的運算大多都屬於浮點樹運算,固有提案使用低功率的浮點數來取代正常的浮點數,相較於一般的浮點數,這些低功率的浮點數有固定的點位,運算過程較一般浮點數簡單,故可以節省電力,我們也可設定這些低功率浮點數所需要的記憶體空間,故可以降低記憶體的需求。使用這種低功率浮點數的缺點在於數字的精度會下降,但深度學習的推論並不要求數字的精度,故在資源有限的情況下,使用此低功率浮點數來取代一般浮點數不失為可行的替代方案。
在本論文裡,我們設計了一套低功率浮點數的編譯流程,我們先在OpenCL C和C++中加入低功率浮點數的函示庫,再透過Clang將其編成中介碼,由於低功率浮點數的特性,我們需要特別改寫編譯的過程,以確保所有低功率浮點數的特性能被完整地保留下來。同時我們也修改了Khronos Group所提供的SPIR-V翻譯器,讓原本的LLVM中介碼能轉換成SPIR-V的格式,我們在SPIR-V中也另外設計並提出新的低功率浮點數指令集,讓各種硬體規格的編譯器後端可以將這些指令直接轉換成硬體上對應的功能。我們在GPGPU-Sim這款模擬器上實作低功率浮點數的相關功能,然後將整套編譯流程導入模擬器中,結果顯示使用低功率浮點數可降低硬體在處理深度學習時所需的能量。
Deep learning is one of the growing methodology in the artificial intelligence field. IoT (Internet of Things) devices can expand its capability with deep learning methods. However, there is a major problem for IoT devices to run deep learning inference. The process costs lots of power and memory that most of IoT devices lacks neither of them. Thus the devices can't perform deep learning inference in the current state of the art. Since most of resource is consumed by the floating-point number calculation. There is a proposal from our lab joint with MediaTek to Khronos OpenCL that is using the fixed-point number to replace floating-point number to reduce the power and memory that the deep learning process required.
In this thesis, we present a prototype flow to add the fixed-point features into the OpenCL and generate the LLVM IR and SPIR-V. Unlike most of variable type, fixed-point type required extra information to work properly. In our design, we address both OpenCL C++ and OpenCL C solutions. In OpenCL C++, we can use template to carry the information, but OpenCL C doesn't support template. We use arguments to pass the information instead. As the information must be constant in compile time, we have to modify the IR code heavily to generate the correct fixed-point type. Our solution is to create a Clang pass to rewrite the LLVM IR into the right format. After we obtain the LLVM IR, we can use a SPIR-V translator to convert LLVM IR into SPIRV format. The SPIR-V translator can be modified and add new fixed-point instructions in it. Therefore the fixed-point functions can be converted into instructions in SPIR-V. If we redirect the process to the backend, we will get the LLVM IR again. We can use the LLVM IR to generate the PTX assembly code and run it on simulator to do the experiment. We use the deep learning program of MNIST as our experiment subject and replace the floating-point functions with fixed-point function. In the result, fixed-point functions help gain 10.67\% of energy-saving per kernel at most comparing to floating-point functions.
Abstract i
Contents iii
List of Figures v
List of Tables vii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overview of the Thesis . . . . . . . . . . . . . . . . . . . 2
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Fixed-Point . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 LLVM-SPIRV . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Reference Design of Fixed Point. . . . . . . . . . . . . . . . 7
3.1 Fixed-Point Type . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Fixed-Point Functions in OpenCL C++. . . . . . . . . . . . . 9
3.3 Fixed-Point Functions in OpenCL C . . . . . . . . . . . . . 10
3.4 Fixed-Point Instructions in SPIR-V3 . . . . . . . . . . . . 17
3.4.1 Capability. . . . . . . . . . . . . . . . . . . . . . . . 17
3.4.2 Declaration . . . . . . . . . . . . . . . . . . . . . . . 17
3.4.3 Conversion Instructions . . . . . . . . . . . . . . . . . 17
3.4.4 Arithmetic Instructions . . . . . . . . . . . . . . . . . 19
3.4.5 Relational Instructions . . . . . . . . . . . . . . . . . 19
3.4.6 Fixed Point Extended Instruction Set. . . . . . . . . . . 21
4 Fixed-Point Function Rewriter . . . . . . . . . . . . . . . . 24
4.1 Fixed-Point Name Reflower in OpenCL . . . . . . . . . . . . 24
4.2 SPIR-V Translator . . . . . . . . . . . . . . . . . . . . . 30
5 Experimental Methodology and Results. . . . . . . . . . . . . 34
5.1 Experimental Methodology. . . . . . . . . . . . . . . . . . 34
5.2 Experimental Results. . . . . . . . . . . . . . . . . . . . 37
6 Conclusion and Future Works . . . . . . . . . . . . . . . . . 40
6.1 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2 Future Works. . . . . . . . . . . . . . . . . . . . . . . . 41
[1] D. Lin, S. Talathi, and S. Annapureddy, “Fixed point quantization of deep convolutional networks,” International Conference on Machine Learning, pp. 2849–2858, 2016.

[2] The OpenCL C++ 1.0 Specification, Khronos OpenCL Working Group, 2018. [Online]. Available: https://www.khronos.org/registry/OpenCL/specs/2.2/pdf/OpenCL Cxx.pdf

[3] The OpenCL Specification, version 2.0, Khronos OpenCL Working Group, 2015. [Online]. Available: https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf

[4] The OpenCL Specification, version 1.2, Khronos OpenCL Working Group, 2012. [Online]. Available: https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf

[5] The OpenCL C 2.2 Specification, Khronos OpenCL Working Group, 2018. [Online]. Available:
https://www.khronos.org/registry/OpenCL/specs/2.2/pdf/OpenCL C.pdf

[6] SPIR-V Specification, Khronos Group, 2018. [Online]. Available:
https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.pdf

[7] LLVM/SPIR-V Bi-Directional Translator, Khronos Group. [Online].
Available: https://github.com/KhronosGroup/SPIRV-LLVMTranslator/blob/master/README.md

[8] J. McFarlane and M. Wong, “Fixedpoint real numbers,” 2016. [Online]. Available:
http://johnmcfarlane.github.io/fixed point/papers/p0037r3.html

[9] OpenCL Extended Instruction Set Specification, Khronos Group,
2018. [Online]. Available: https://www.khronos.org/registry/spir-
/specs/unified1/OpenCL.ExtendedInstructionSet.100.pdf

[10] SPIR generator/Clang Installation Instructions,
Khronos Group. [Online]. Available:
https://github.com/KhronosGroup/SPIR/blob/spir 12/README.md

[11] T. M. Aamodt, W. W. Fung, I. Singh, A. El-Shafiey, J. Kwa, T. Hetherington, A. Gubran, A. Boktor, T. Rogers, A. Bakhoda, and et al., GPGPU-Sim 3. x Manual.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *