帳號:guest(3.16.207.192)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):鄭盛元
作者(外文):Cheng, Sheng-Yuan
論文名稱(中文):TVM 與 Neuron 整合應用場景及硬體加速器上的排程最佳化
論文名稱(外文):Enable Compute Graph Pipeline and Applications on TVM with Neuron
指導教授(中文):李政崑
指導教授(外文):Lee, Jenq-Kuen
口試委員(中文):關啟邦
洪明郁
口試委員(外文):Kuan, Ci-Bang
Hung, Ming-Yu
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:110062538
出版年(民國):112
畢業學年度:111
語文別:英文
論文頁數:40
中文關鍵詞:AI 模型推論TVMTVM BYOC
外文關鍵詞:TVM BYOCRelay IRDeep LearningAI CompilerCompute Graph Pipeline
相關次數:
  • 推薦推薦:0
  • 點閱點閱:118
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
由於在邊緣裝置上用於機器學習模型推斷的需求不斷增加,多個平台為了滿足這個需求而出現。在這些平台中,TVM 由於支援多個不同的框架和優化而脫穎而出。不過 TVM 不能很好的支援不同硬體廠商專造的加速器。而在移動裝置如同手機上,聯發科技的 AI 解決方案 Neuron 在手機晶片上提供了非常高效的模型推斷能力。然而,它支援的機器學習框架大多是 tflite,而非全部框架都直接支援。因此,結合這兩個平台可以讓流行的機器學習框架如 PyTorch、TensorFlow、ONNX和 MxNet 流入 Neuron,同時充分發揮聯發科加速器的優勢。


為了應對這一挑戰,我們以TVM BYOC 流程作為解決方案的基礎。透過這種方法,我們能夠有效整合各種不同機器學習框架的不同任務。為了展示我們解決方案的多功能性,我們開發了一個應用程式,其中包含三個不同的模型:一個來自 PyTorch 的人臉防偽模型,一個來自 Keras 的情緒檢測模型,以及一個來自 tflite 的物體檢測模型。在推論過程中,我們意識到這些模型之間的相互依賴關係,因此引入了一個旨在優化應用程式展示性能的計算圖管道化算法來進行優化。透過將管道算法應用於計算圖中,我們能夠透過將不同的子圖分配給多個硬體設備來增強整個應用推論速度。這種方法不僅加速了推論過程,還優化了整個應用程式的資源分配。
As the need for machine learning inference on mobile devices continues to rise, several platforms have emerged to cater to this demand. Among these platforms, TVM stands out as a comprehensive AI compiler. However, one significant limitation of TVM is its inability to support all the accelerators provided by different manufacturers.

Conversely, Neuron, MediaTek's AI solution, excels in delivering high-performance inference on mobile devices. However, it falls short in terms of supporting the wide range of common machine learning frameworks. As a result, it becomes advantageous to combine the strengths of both platforms. This integration allows for the utilization of popular machine learning frameworks like ONNX, PyTorch, TensorFlow and MxNet, while leveraging the power of MediaTek's AI accelerator.

To tackle this challenge, we embrace the TVM BYOC, which is Bring Your Own Codegen in abbreviation, flow as the foundation of our solution. By utilizing this approach, we can effectively handle the integration of various machine learning frameworks for different tasks. To showcase the versatility of our solution, we develop an application that incorporates three distinct models: a face anti-spoofing model from PyTorch, an emotion detection model from Keras, and an object detection model from Tflite. Recognizing the inter-dependencies among these models during inference, we introduce a pipeline algorithm prototype aimed at optimizing the performance of our application showcase. By employing the pipeline algorithm to the computation graph, we can enhance the execution speed by allocating different subgraphs to multiple hardware devices. This approach not only accelerates the inference process but also optimizes resource allocation for the overall application. In our experiment on the MTK MT6873 chipset, the BYOC flow significantly improved performance by up to 6X with APU acceleration.
摘要
Abstract
致謝
Contents
List of Figures
List of Tables
1 Introduction -------------------------- 1
2 Background ---------------------------- 6
3 The Integration of TVM and Neuron ----- 10
4 Application of TVM-Neuron ------------- 14
5 Enhancement: Pipeline Scheduling ------ 25
6 Experiment ---------------------------- 33
7 Conclusion ---------------------------- 36
Bibliography
[1] J. Bai, F. Lu, K. Zhang et al., “Onnx: Open neural network exchange,”
https://github.com/onnx/onnx, 2019.
[2] M. Abadi, “Tensorflow: learning functions at scale,” in Proceedings of
the 21st ACM SIGPLAN International Conference on Functional Pro-
gramming, 2016, pp. 1–1.
[3] N. Ketkar and N. Ketkar, “Introduction to keras,” Deep learning with
python: a hands-on introduction, pp. 97–111, 2017.
[4] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imper-
ative style, high-performance deep learning library,” Advances in neural
information processing systems, vol. 32, 2019.
[5] T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen,
L. Wang, Y. Hu, L. Ceze et al., “Tvm: An automated end-to-end op-
timizing compiler for deep learning,” arXiv preprint arXiv:1802.04799,
2018.
[6] T.-C. Chen, W.-T. Wang, K. Kao, C.-L. Yu, C. Lin, S.-H. Chang, and
P.-K. Tsung, “Neuropilot: A cross-platform framework for edge-ai,” in 2019 IEEE International Conference on Artificial Intelligence Circuits
and Systems (AICAS), 2019, pp. 167–170
[7] Z. Chen, C. H. Yu, T. Morris, J. Tuyls, Y.-H. Lai, J. Roesch, E. Delaye,
V. Sharma, and Y. Wang, “Bring your own codegen to deep learning
compiler,” arXiv preprint arXiv:2105.03215, 2021.
[8] Apache, “Using pipeline executor in relay.” [Online]. Avail-
able: https://tvm.apache.org/docs/how_to/work_with_relay/using_
pipeline_executor.html
[9] C. Lattner and V. Adve, “Llvm: A compilation framework for lifelong
program analysis & transformation,” in International Symposium on
Code Generation and Optimization, 2004. CGO 2004. IEEE, 2004, pp.
75–86.
[10] J. Roesch, S. Lyubomirsky, L. Weber, J. Pollock, M. Kirisame, T. Chen,
and Z. Tatlock, “Relay: A new ir for machine learning frameworks,”
in Proceedings of the 2nd ACM SIGPLAN International Workshop
on Machine Learning and Programming Languages, ser. MAPL 2018.
New York, NY, USA: Association for Computing Machinery, 2018, p.
58–68. [Online]. Available: https://doi.org/10.1145/3211346.3211348
[11] Y. Gorbachev, M. Fedorov, I. Slavutin, A. Tugarev, M. Fatekhov, and
Y. Tarkan, “Openvino deep learning workbench: Comprehensive analy-
sis and tuning of neural networks inference,” in Proceedings of the IEEE/
CVF International Conference on Computer Vision Workshops, 2019,
pp. 0–0.
[12] H.-R. Huang, D.-Y. Hong, J.-J. Wu, P. Liu, and W.-C. Hsu, “Efficient
video captioning on heterogeneous system architectures,” in 2021 IEEE
International Parallel and Distributed Processing Symposium (IPDPS).
IEEE, 2021, pp. 1035–1045.
[13] P. Liu and J.-J. Wu, “Task scheduling techniques for deep learning in
heterogeneous environment,” in 2019 Seventh International Symposium
on Computing and Networking Workshops (CANDARW). IEEE, 2019,
pp. 141–147.
[14] H. Topcuoglu, S. Hariri, and M.-Y. Wu, “Performance-effective and low-
complexity task scheduling for heterogeneous computing,” IEEE trans-
actions on parallel and distributed systems, vol. 13, no. 3, pp. 260–274,
2002.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *