針對 TVM AI 編譯器支援神經網路之數學運算在 RISC-V 向量化的方法_

帳號：guest(216.73.216.146) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	羅士恩
作者(外文):	Lo, Shih-En.
論文名稱(中文):	針對 TVM AI 編譯器支援神經網路之數學運算在 RISC-V 向量化的方法
論文名稱(外文):	Support Math Operators of Neural Network in TVM for RISC-V V Extension
指導教授(中文):	李政崑
指導教授(外文):	Lee, Jenq-Kuen
口試委員(中文):	關啟邦游逸平
口試委員(外文):	Kuan, Chi-Bang You, Yi-Ping
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	107062535
出版年(民國):	109
畢業學年度:	109
語文別:	英文
論文頁數:	32
中文關鍵詞:	神經網路、TVM、LLVM、RISC-V、單一指令流多重資料流
外文關鍵詞:	Neural Network、TVM、LLVM、RISC-V、SIMD
相關次數:	推薦:0 點閱:636 評分: 下載:0 收藏:0

神經網路藉由一連串的數學運算得到想要的結果，當中可能包含一般的數學函式，如abs和sqrt。另外由於仿生物神經的概念，經常會將激活函數放置於隱藏層後，如ReLU、sigmoid和tanh，其作用如傳遞訊號的開關。sigmoid和tanh常用在語音辨識等模型，ReLU則常用於較深的CNN模型。TVM 是一個針對深度學習設計，端到端的編譯器，其主要的作用是解析來自不同框架的模型，將其轉換成Relay中間表示，並透過Relay Operator的策略，選定對於特定硬體或指令集較有利運算流程和方法。最終，特定的Relay運算單元，會呼叫在TOPI函式庫中對應的函式，以得到更好的效能。
RISC-V 是一個開源指令集，其有彈性的礦展模組設計，我們可以利用當中的向量指令擴展模組，去達到單一指令流多重資料流的平行化運算。所以我們設計了一個在TVM中，向量化神經網路數學運算單元的方法。先將其分成兩種類別，第一個類別是可以直接被RISC-V的向量指令支援的數學運算，在LLVM中處理單一指令流多重資料流實作底層運算。第二個類別運算過程中含有運算不被向量指令直接支援，則透過Relay pass FastMath，將原本的運算單元置換成運算步驟全都可由向量指令支援的近似算法。在實驗中，與原先未經向量指令支援的數學運算相比，我們的流程在大部分的運算中都能增加1.7倍以上的效能。在tanh，更由於置換原先算法至近似解而增加6倍以上的效能。

In addition to the usual mathematical functions like sqrt or abs, the activation functions are often attached for different purposes after hidden layers because of the concept of bionic neural networks. Sigmoid and tanh are widely used in speech recognition models. ReLU is often used in deeper CNN models.
TVM is an end to end compiler stack and enables deep learning models to be deployed on specific targets. TVM parses the models from different frameworks and use the corresponding Relay operators to form high-level graph. The Relay operator strategy is a combination of a schedule and a computing function. The Relay operator strategy can also be designed according to different features of targets to lower each Relay operator to a proper function for specific target in TOPI library to get better performance.

RISC-V is the fifth generation instruction set architecture based on the RISC principle for open source. RISC-V supports SIMD in the Vector extension, so it can get more performance from parallel computing. We designed a flow to vectorize more math operators of neural networks in TVM for RISC-V V Extension (RVV). When the operation behavior of the math operator is directly supported by the RVV instruction, TVM usually generates intrinsics in LLVM IR for common math functions. We handle those intrinsics and lowering it to RVV instructions. When the calculation behavior of the math operators cannot be directly supported by the RVV instructions, we use Relay pass FastMath to replace original Relay operators with new ones that suitable for RVV instructions. New operators could be lowered to approximation functions. RVV instructions can support all operations in approximation. We handle the lowering of the LLVM backend related to RVV, so that it can generate vector instructions that required to corresponding operators in the TOPI library.

Our experiment measures the ticks of operator which shape is [1x3x112x112] under different schedules. The first optimization is that splitting and vectorization. We split the most inner loop with factor 4 and then vectorize inner loop. The second optimization is that tiling and vectorization. We tile the most two inner loops with factor 4 and then vectorize each inner loop of them. In our experiment, the vector type of each experiment is v4f32. We reach speed up more than 1.7 mostly with RISC-V SIMD. In tanh, it achieves a speedup of more than 6 times, because Relay pass FastMath change the original computation to approximation of tanh.

Abstract i
Contents iii
List of Figures v
List of Tables vi
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 4
2.1 TVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 RISC-V Vector Extension . . . . . . . . . . . . . . . . . . . . 6
3 Vectorize Math Operators in TVM 8
3.1 Relay Operator Strategy for RISC-V . . . . . . . . . . . . . . 9
3.2 Schedule for RVV . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Relay FastMath Pass . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Activation Function . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Approximate Activation Function
for RVV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Lowering of LLVM Backend for RVV 19
4.1 LLVM Intrinsic Function . . . . . . . . . . . . . . . . . . . . . 21
4.2 Instruction selection and DAG Combine . . . . . . . . . . . . 23
5 Experimental Methodology and Results 24
5.1 Experimental Methodology . . . . . . . . . . . . . . . . . . . . 24
5.2 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . 25
6 Conclusion and Future Works 28
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

[1] J. K. Lee, C. C. Yang, A. Lu, P. Chen, Y. M. Chang, C. Chang, Y. R.
Chen, H. Liao, C. L. Lee, S. H. Lu, and S. C. Wang, \Tvm: An auto-
mated end-to-end optimizing compiler for deep learning," 2019.
[2] C. Lattner and V. Adve, \Llvm: A compilation framework for lifelong
program analysis & transformation," in Proceedings of the international
symposium on Code generation and optimization: feedback-directed and
runtime optimization. IEEE Computer Society, 2004, p. 75.
[3] T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen, M. Cowan,
L. Wang, Y. Hu, L. Ceze et al., \Tvm: An automated end-to-end op-
timizing compiler for deep learning," in 13th USENIX Symposium on
Operating Systems Design and Implementation (OSDI 18), 2018, pp.
578{594.
[4] F. Chollet, Keras, 2015. [Online]. Available: https://keras.io/
[5] MXNet, apache foundation, 2015. [Online]. Available:
https://mxnet.apache.org/
[6] Tensor
ow, Google, 2015. [Online]. Available:
https://www.tensor
ow.org/
[7] J. McFarlane and M. Wong, Pytorch, 2016. [Online]. Available:
https://pytorch.org/
[8] ONNX, ONNX Project Contributors, 2017. [Online]. Available:
https://onnx.ai/
[9] J. Roesch, S. Lyubomirsky, L. Weber, J. Pollock, M. Kirisame, T. Chen,
and Z. Tatlock, \Relay: a new ir for machine learning frameworks,"
in Proceedings of the 2nd ACM SIGPLAN International Workshop on
Machine Learning and Programming Languages. ACM, 2018, pp. 58{
68.
[10] RISC-V, RISC-V foundation, 2010. [Online]. Available:
https://riscv.org
[11] H. Lin, P. Chen, Y.-S. Hwang, and J.-K. Lee, \Devise rust compiler opti-
mizations on risc-v architectures with simd instructions," in Proceedings
of the 48th International Conference on Parallel Processing: Workshops,
2019, pp. 1{7.
[12] RISC-V Vector Extension, RISC-V foundation, 2018. [Online].
Available: https://github.com/riscv/riscv-v-spec
[13] Eigen, 2018. [Online]. Available: https://eigen.tuxfamily.org/
[14] Newlib, 2018. [Online]. Available: https://github.com/eblot/newlib
[15] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu,
J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell,
M. Shoaib, N. Vaish, M. D. Hill, and D. A.Wood, \The gem5 simulator,"
May 2011.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文