帳號:guest(216.73.216.146)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳丕祐
作者(外文):Chen, Pi-You
論文名稱(中文):針對 TVM AI 編譯器支援 RISC-V 向量指令集
論文名稱(外文):Support TVM AI Compiler for RISC-V with V Extension
指導教授(中文):李政崑
指導教授(外文):Lee, Jenq-Kuen
口試委員(中文):莊庭瑞
蔡錫鈞
陳呈瑋
口試委員(外文):Chuang, Tyng-Ruey
Tsai, Shi-Chun
Chen, Cheng-Wei
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:107062570
出版年(民國):109
畢業學年度:108
語文別:英文
論文頁數:36
中文關鍵詞:深度學習LLVM編譯器單一指令流多重資料流RISC-V
外文關鍵詞:Deep LearningLLVMCompilerSIMDRISC-V
相關次數:
  • 推薦推薦:0
  • 點閱點閱:1403
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
RISC-V 是一個開源的指令集。其小且靈活搭配的特性可以讓任意的依照自己的需求選用特定的擴增。而向量指令集便是其中一個擴增。它可以讓 RISC-V 有效的運用 SIMD 去加速運行於 RISC-V 架構上的人工智慧應用。
TVM 是一項開放原始碼的深度學習編譯器框架。作為 AI 編譯器,TVM 可以解決如何讓一個深度學習的模型可以使用符合硬體特性的方式產生對應的程式。然而,TVM 並沒有支援 RISC-V 向量指令集。
在本論文中,我們設計了一套流程使得 TVM 可以支援 RISC-V 向量指令集並且藉由改善 LLVM 與 TVM 的流程以 產生出更為有效率的程式碼。我們的改進包含使用內在函數表達向量指令、使用 TVM 排程器尋找更好的可向量化空間、藉由 LLVM 改善 vsetvl 指令以及在生成代碼階段產生更好的 RISC-V 組合語言。在我們的實驗中,選用了多個熱門且關鍵的 CNN 模型作為我們 TVM 支援 RISC-V 向量指令集流程中進行正確性驗證與效能分析的測量對象。我們的流程與舊有流程相比獲得了 2.98-5.88 倍的效能提昇。
RISC-V is an open-source ISA with small and flexible features. User for RISC-V could select the extension by their requirements for the specific application. Vector extension is one of the RISC-V extension to enable the superword SIMD in RISC-V architecture to support the fallback engines of
the AI Computing.

TVM, as an open deep learning compiler stack for neural network models. TVM is an AI compiler to tackle this issue which optimizes specific hardware by fitting the hardware feature. However, TVM doesn’t support the vector extension in RISC-V flow.

In our thesis, we describe the techniques to efficiently support TVM on RISC-V with V extension via TVM layer and LLVM optimizations. Note RISC-V vector extension allows one to dynamically set the size of each element in the vector and also the number of vector elements. Our optimization includes vector extension support by intrinsic function, the TVM scheduler tuning to extend vectorization index space, the LLVM compiler optimizations of the vsetvl instructions, and generate efficient RISC-V assembly.

In our experiments, key CNN models are measured and validated with our TVM flow on RISC-V with V extension. Our flow achieves 2.98x-5.88x performance improvement over the version
without RISC-V SIMD flow.
Abstract i
Contents ii
List of Figures iv
List of Tables v
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 4
2.1 RISC-V Vector Extension . . . . . . . . . . . . . . . . . . . . 4
2.2 TVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Enable RISC-V flow in TVM 10
3.1 Modify TVM Codegen Phase . . . . . . . . . . . . . . . . . . 12
3.2 TVM Runtime and DLR . . . . . . . . . . . . . . . . . . . . . 13
3.3 LLVM Backend Lowering . . . . . . . . . . . . . . . . . . . . . 14
3.4 LLVM Intrinsic Function . . . . . . . . . . . . . . . . . . . . . 15
3.5 Declare Register and Instruction in LLVM . . . . . . . . . . . 16
3.6 Instruction Selection . . . . . . . . . . . . . . . . . . . . . . . 16
3.7 Target Lowering . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Optimize in RISC-V Vector Extension 18
4.1 TVM Schedule for RVV . . . . . . . . . . . . . . . . . . . . . 19
4.2 Support stride load/store . . . . . . . . . . . . . . . . . . . . . 20
4.3 Enable Float Multiply Add in LLVM . . . . . . . . . . . . . . 21
4.4 Better Zerointializer in RVV . . . . . . . . . . . . . . . . . . . 23
4.5 Remove Redundancy Vsetvl . . . . . . . . . . . . . . . . . . . 25
5 Experimental Methodology and Results 28
5.1 Experiment envirnment . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . 29
6 Conclusion and Future Works 31
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
[1] RISC-V, RISC-V foundation, 2010. [Online]. Available:
https://riscv.org
[2] H. Lin, P. Chen, Y.-S. Hwang, and J.-K. Lee, “Devise rust compiler opti-
mizations on risc-v architectures with simd instructions,” in Proceedings
of the 48th International Conference on Parallel Processing: Workshops,
2019, pp. 1–7.
[3] RISC-V Vector Extension, RISC-V foundation, 2018. [Online].
Available: https://github.com/riscv/riscv-v-spec
[4] T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen, M. Cowan,
L. Wang, Y. Hu, L. Ceze et al., “Tvm: An automated end-to-end op-
timizing compiler for deep learning,” in 13th USENIX Symposium on
Operating Systems Design and Implementation (OSDI 18), 2018, pp.
578–594.
[5] Tensorflow, Google, 2015. [Online]. Available:
https://www.tensorflow.org/
[6] J. McFarlane and M. Wong, Pytorch, 2016. [Online]. Available:
https://pytorch.org/
[7] MXNet, apache foundation, 2015. [Online]. Available:
https://mxnet.apache.org/
[8] coreML, Apple, 2017. [Online]. Available:
https://developer.apple.com/documentation/coreml
[9] ONNX, ONNX Project Contributors, 2017. [Online]. Available:
https://onnx.ai/
[10] J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Ama-
rasinghe, “Halide: a language and compiler for optimizing parallelism,
locality, and recomputation in image processing pipelines,” in Acm Sig-
plan Notices, vol. 48, no. 6. ACM, 2013, pp. 519–530.
[11] J. Roesch, S. Lyubomirsky, L. Weber, J. Pollock, M. Kirisame, T. Chen,
and Z. Tatlock, “Relay: a new ir for machine learning frameworks,”
in Proceedings of the 2nd ACM SIGPLAN International Workshop on
Machine Learning and Programming Languages. ACM, 2018, pp. 58–
68.
[12] A. Lu, C.-L. Lee, Y.-M. Chang, P.-Y. Chen, H.-W. Sung, H. Lin, S.-C.
Wang, and J.-K. Lee, “Enabling tvm on risc-v architectures with simd
instructions,” 2019.
[13] J.-K. Lee, A. Lu, Y.-M. Chang, C.-L. Lee, P.-Y. Chen, and S.-C. Wang,
“Supporting tvm on risc-v architectures,” 2018.
[14] J.-K. Lee, C.-C. Yang, A. Lu, P.-Y. Chen, Y.-M. Chang, C. Chang, Y.-
R. Chen, H. Liao, C.-L. Lee, S.-H. Lu, and S.-C. Wang, “Supporting
tvm on risc-v architectures with simd computations,” 2019.
[15] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner et al., “Gradient-based
learning applied to document recognition,” Proceedings of the IEEE,
vol. 86, no. 11, pp. 2278–2324, 1998.
[16] A. Krizhevsky, “One weird trick for parallelizing convolutional neural
networks,” arXiv preprint arXiv:1404.5997, 2014.
[17] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,
T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convo-
lutional neural networks for mobile vision applications,” arXiv preprint
arXiv:1704.04861, 2017.
[18] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mo-
bilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition,
2018, pp. 4510–4520.
[19] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and
K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer param-
eters and¡ 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016.
[20] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
once: Unified, real-time object detection,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2016, pp. 779–
788.
[21] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,”
arXiv preprint arXiv:1804.02767, 2018.
22] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
the inception architecture for computer vision,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2016, pp.
2818–2826.
(此全文未開放授權)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *