PCIe 頻寬共享限制下的多執行個體 GPU 之調度器__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.146) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	唐晏湄
作者(外文):	Tang, Yan-Mei
論文名稱(中文):	PCIe 頻寬共享限制下的多執行個體 GPU 之調度器
論文名稱(外文):	PCIe Bandwidth Aware Scheduling for Multi-Instance GPU
指導教授(中文):	周志遠
指導教授(外文):	Chou, Jerry
口試委員(中文):	賴冠州李哲榮
口試委員(外文):	Lai, Kuan-Chou Lee, Che-Rung
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	111062582
出版年(民國):	113
畢業學年度:	112
語文別:	英文
論文頁數:	26
中文關鍵詞:	多執行個體 GPU、資源共享、多租戶
外文關鍵詞:	Multi-Instance GPU、Resource Sharing、Multi-Tenancy
相關次數:	推薦:0 點閱:156 評分: 下載:0 收藏:0

隨著 GPU 技術不斷發展，也促進人工智慧相關領域越加蓬勃發展。然而當 GPU 的計算能力進步速度遠遠超過許多常見工作的需求，就導致資源使用效率低落、資源浪費的問題。NVIDIA 於是推出多執行個體 GPU (MIG) 的功能，其能夠將一張完整的 GPU 在硬體架構層面切分成多個 MIG 切片。這項功能能夠讓使用者在同一張 GPU 上同時運行多個不同的工作，也因此能夠提高原先被浪費掉的資源。儘管 MIG 能夠確保不同 MIG 切片之間的資源獨立，然而我們發現這些 MIG 切片之間仍舊會共享同一個 PCIe 頻寬，有可能會造成頻寬競爭的問題，特別在運行的工作對 PCIe 頻寬需求特別大量時會更加嚴重。我們是第一個發現這項問題的研究，並且也針對頻寬競爭的狀況提出了解決方法。我們設計一個會考量到 PCIe 頻寬限制的多執行個體 GPU (MIG) 之調度器來管理 PCIe 頻寬在多個 MIG 切片之間共享的情形。我們的實驗結果也證實了我們提出的調度器和基準方法的累積作業時間相比有明顯下降，而且這項實驗結果在實體的 NVIDIA A100 GPU 上以及模擬環境中都有得到實證。

As GPU technology advances, it significantly propels progress in the field of AI. However, the rapid growth in GPU computational power often outpaces the capabilities required by many existing workloads, resulting in under utilization of GPU resources. The introduction of Multi-Instance GPU (MIG) by NVIDIA A100 GPUs enables the partitioning of physical GPU resources. This allows multiple tasks to share entire GPUs simultaneously, thereby enhancing resource utilization. Despite the resource isolation provided by MIG, we observe that PCIe bandwidth remains shared among MIG slices, potentially leading to contention, especially during periods of intense PCIe bandwidth utilization. Our research identifies and addresses this issue, being among the first to demonstrate the occurrence of PCIe bandwidth contention across MIG slices in tasks with high bandwidth requirements. To mitigate this contention issue, we propose a PCIe bandwidth aware scheduler designed to effectively manage PCIe bandwidth sharing across MIG slices. Our experimental results demonstrate that the scheduler outperforms the baseline approach in reducing aggregated job completion time, as demonstrated in both on NVIDIA A100 GPU hardware and simulations.

1. Introduction 1
2. Related Work 4
3. Preliminary Experiments 7
4. Methodology 11
5. Evaluation 18
6. Conclusion 24
Reference 25

[1] Belviranli, M. E., Khorasani, F., Bhuyan, L. N., and Gupta, R. Cumas: Data transfer aware multi-application scheduling for shared gpus. In Proceedings of the 2016 International Conference on Supercomputing (2016), pp. 1–12.
[2] Chaudhary, S., Ramjee, R., Sivathanu, M., Kwatra, N., and Viswanatha, S. Balancing efficiency and fairness in heterogeneous gpu clusters for deep learning. In Proceedings of the Fifteenth European Conference on Computer Systems (2020), pp. 1–16.
[3] Chen, Q., Yang, H., Mars, J., and Tang, L. Baymax: Qos awareness and increased utilization for non-preemptive accelerators in warehouse scale computers. ACM SIGPLAN Notices 51, 4 (2016), 681–696.
[4] Li, B., Gadepally, V., Samsi, S., and Tiwari, D. Characterizing multi-instance gpu for machine learning workloads. In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (2022), IEEE, pp. 724–731.
[5] Li, B., Patel, T., Samsi, S., Gadepally, V., and Tiwari, D. Miso: exploiting multi-instance gpu capability on multi-tenant gpu clusters. In Proceedings of the 13th Symposium on Cloud Computing (2022), pp. 173–189.
[6] Porter, C., Chen, C., and Pande, S. Compiler-assisted scheduling for multi-instance gpus. In Proceedings of the 14th Workshop on General Purpose Processing Using GPU (2022), pp. 1–6.
[7] Shen, H., Chen, L., Jin, Y., Zhao, L., Kong, B., Philipose, M., Krishnamurthy, A., and Sundaram, R. Nexus: A gpu cluster engine for accelerating dnn-based video analysis. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (2019), pp. 322–337.
[8] Tan, C., Li, Z., Zhang, J., Cao, Y., Qi, S., Liu, Z., Zhu, Y., and Guo, C. Serving dnn models with multi-instance gpus: A case of the reconfigurable machine scheduling problem. arXiv preprint arXiv:2109.11067 (2021).
[9] Van Heeswijk, M., Miche, Y., Oja, E., and Lendasse, A. Gpu accelerated and parallelized elm ensembles for large-scale regression. Neurocomputing 74, 16 (2011), 2430–2437.
[10] Weng, Q., Xiao, W., Yu, Y., Wang, W., Wang, C., He, J., Li, Y., Zhang, L., Lin, W., and Ding, Y. {MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22) (2022), pp. 945–960.
[11] Yang, Z., Wu, H., Xu, Y., Wu, Y., Zhong, H., and Zhang, W. Hydra: Deadline-aware and efficiency-oriented scheduling for deep learning jobs on heterogeneous gpus. IEEE Transactions on Computers (2023).
[12] Zhang, H., Li, Y., Xiao, W., Huang, Y., Di, X., Yin, J., See, S., Luo, Y., Lau, C. T., and You, Y. Migperf: A comprehensive benchmark for deep learning training and inference workloads on multi-instance gpus. arXiv preprint arXiv:2301.00407 (2023).

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文