使用RISC-V P Extension優化在TVM中使用的低功耗數值__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.88) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	陳奕儒
作者(外文):	Chen, Yi-Ru
論文名稱(中文):	使用RISC-V P Extension優化在TVM中使用的低功耗數值
論文名稱(外文):	Enable Low-Power Numerical and Precision in TVM with RISC-V P extension
指導教授(中文):	李政崑
指導教授(外文):	Lee, Jenq-Kuen
口試委員(中文):	關啟邦王紹仲
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	108062522
出版年(民國):	110
畢業學年度:	109
語文別:	中文
論文頁數:	62
中文關鍵詞:	量化、Fixed Point、深度學習編譯器、RISC-V、TVM
外文關鍵詞:	量化、Fixed Point、Deep Learning Compiler、RISC-V、TVM
相關次數:	推薦:0 點閱:916 評分: 下載:0 收藏:0

基於市場與實務需求，CNN模型的準確度並不是唯一的指標，硬體資源如記憶體與電力在執行推斷時的消耗量也非常重要，尤其是將ML應用部屬在終端設備上時，其記憶體與電量必有一定的限制。為了解決這個問題，學者們經常使用量化技術來縮減應用中數值所使用的bits數，使用低精度/低功耗數值來表達運算中的激活值與權重。此篇論文便著重於該議題，使用與引入低精度/功耗數值轉換．使用Quantized integer與Fixed point integer於深度學習編譯器(TVM)中，且利用RISC-V P extension支援Subword SIMD運算的特性，來優化我們轉換後的低精度運算，在硬體指令的使用上提高Convolution運算中乘積的效能。
此篇論文主要支援兩種流程，兩種流程皆佐以前面描述的指令優化，對於量化過的模型，我們使用現有的QNN Dialect來對接與引入Quantized integer；而對於一般的FP32模型，我們會介紹Graph-level的轉換(AutoFXP)，其幫助我們將模型中指定的Convolution layer自動從FP32運算轉換為Fixed point運算，而對於轉換過程中需要設置的參數 - binary point，我們設計了Profiling-based的Selector來自動化的取得每一個轉換層所需要的binary point，其中亦考慮了精準度的維持與效能的優化程度。
實驗顯示，以最終推斷時的指令數量為比較，優化過的Quantization流程可以比一般模型高過2.43至9.98倍的加速，實驗模型包含MobileNet v1, v2 與 Inception v3, v4；而對於優化過的Fixed point流程，則是可以比一般的模型達到約1.31至8.95倍的加速，同時維持在約1%內的精準度損失，實驗模型包含MobileNet, SqueezeNet, VGG, ResNet 與 Inception。實驗使用2000張ImageNet的圖來測量精準度，使用Spike(RISC-V ISA Simulator)作為平台。

With industry demands, the accuracy of CNN models isn't the only criteria. The memory and power consumption while inferencing the model are also crucial. This is especially for the application of edge devices which only have limited power and memory resources. To address this issue, researchers use quantization technique to utilize the lower precision numerical, which narrows down the usage of bits by representing activation and weight with quantized data. This thesis targets on the topic with enabling low-power precision and numerical, including quantized and fixed point integer, into deep learning compiler, Apache TVM. With the main computation constructed by integer type, we leverage the strength of RISC-V P extension, which supports subword SIMD computation, to improve the performance of dot product computation in convolution layers. For quantized integer flow, we reuse TVM QNN dialect with optimization mentioned above. In terms of fixed point flow, we introduce a graph-level transformation to convert floating point to fixed point in the specified layers in a neural network. With regard to the selection of binary point for fixed point type, it affects the quality of transformation and shouldn't be fixed for each convolution layer since the value ranges of each layer are different. To maintain the accuracy after transformation, we provide a profiling based approach to select appropriate binary point for each layer. Experiment shows that our work can achieve 2.43x-9.98x reduction of instruction count in quantized integer flow. For fixed point flow, it shows 1.31x-8.95x performance improvement with less than 1.0% accuracy loss. All experiments are compared with untuned FP32 models and executed on Spike with 2000 images. The benchmark including common CNN models such as MobileNet, SqueezeNet, VGG, ResNet, and Inception.

Abstract i
Contents iii
List of Figures vi
List of Algorithms viii
List of Tables ix
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overview of This Thesis . . . . . . . . . . . . . . . . . . . . . 1
2 Background 5
2.1 TVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 RISC-V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Lower Precision Numerical . . . . . . . . . . . . . . . . . . . . 9
2.4 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.1 Quantization Algorithm . . . . . . . . . . . . . . . . . 10
2.4.2 Quantization in Deep Learning Framework . . . . . . . 10
2.5 The Fixed Point Type System . . . . . . . . . . . . . . . . . . 12
iii3 Low-Power Numerical and Precision in TVM 13
3.1 QNN Dialect . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Automatic Fixed Point Transformation . . . . . . . . . . . . . 17
4 Optimization with RISC-V P Extension 23
4.1 Optimize Convolution in TVM . . . . . . . . . . . . . . . . . . 24
4.1.1 Tensor-level Optimization . . . . . . . . . . . . . . . . 24
4.1.2 Efficiently Execute Convolution with
SIMD instructions . . . . . . . . . . . . . . . . . . . . 25
4.2 Handling Instruction Matching in LLVM . . . . . . . . . . . . 28
5 Profiling-based Binary Point Selector 31
5.1 Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Design of Strategy . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.1 Observation of Fixed Point . . . . . . . . . . . . . . . . 35
5.2.2 Strategies and Preferences . . . . . . . . . . . . . . . . 36
5.3 Overview of The Selector . . . . . . . . . . . . . . . . . . . . . 38
6 Experiments 43
6.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.3 Experimental Results and Discussion . . . . . . . . . . . . . . 45
6.3.1 Accuracy under Different Flows . . . . . . . . . . . . . 53
6.3.2 Accuracy under Different Configurations of AutoFXP . 53
6.3.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . 56
7 Conclusion and Future Works 57
iv7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

[1] T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy, “Tvm: An automated end-to-end optimizing compiler for deep learning,” 2018.
[2] Specification of RISC-V P extension, RISC-V, accessed: 2021-05-11. [Online]. Available: https://github.com/riscv/riscv-p-spec
[3] TFLite Hosted Model, accessed: 2021-05-11. [Online]. Available:
https://www.tensorflow.org/lite/guide/hosted models
[4] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” arXiv preprint arXiv:1912.01703, 2019.
[5] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu,
C. Zhang, and Z. Zhang, “Mxnet: A flexible and efficient machine
learning library for heterogeneous distributed systems,” arXiv preprint arXiv:1512.01274, 2015.
[6] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 675–678.
[7] ONNX, accessed: 2021-05-11. [Online]. Available: https://onnx.ai/
[8] A. Jain, S. Bhattacharya, M. Masuda, V. Sharma, and Y. Wang, “Efficient execution of quantized deep learning models: A compiler approach,” arXiv preprint arXiv:2006.10226, 2020.
[9] J. Roesch et al., “Relay: A new ir for machine learning frameworks,” in Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2018, pp. 58–68.
[10] J. Cong and B. Xiao, “Minimizing computation in convolutional neural networks,” in International conference on artificial neural networks. Springer, 2014, pp. 281–290.
[11] ARM NEON Instructions, ARM, accessed: 2021-05-11. [Online]. Available: https://developer.arm.com/architectures/instruction-sets/simdisas/neon
[12] Intel VNNI Instructions, Intel, accessed: 2021-05-11. [Online].
Available: https://en.wikichip.org/wiki/x86/avx512 vnni
[13] Specification of RISC-V P extension, RISC-V, accessed: 2021-05-11. [Online]. Available: https://github.com/riscv/riscv-v-spec
[14] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,
T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
[15] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
[16] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
[17] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017.
[18] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016.
[19] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[21] RISC-V ISA Manual, RISC-V, document version: 2019-12-13. [Online]. Available: https://riscv.org/technical/specifications/
[22] Post-training Quantization, TensorFlow, accessed: 2021-05-11. [Online]. Available: https://www.tensorflow.org/lite/performance/post training
quantization
[23] Quantization-aware Training, TensorFlow, accessed: 2021-05-11.
[Online]. Available: https://www.tensorflow.org/model optimization
/guide/quantization/training
[24] Quantization Docs, Pytorch, accessed: 2021-05-11. [Online]. Available: https://pytorch.org/docs/stable/quantization.html
[25] C.-L. Lee, M.-Y. Hsu, B.-S. Lu, M.-Y. Hung, and J.-K. Lee, “Experiment and enabled flow for gpgpu-sim simulators with fixed-point instructions,” Journal of Systems Architecture, vol. 111, p. 101783, 2020.
[26] J. Deng et al., “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 248–255

(此全文20260706後開放外部瀏覽)
電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文