|
[1] Belviranli, M. E., Khorasani, F., Bhuyan, L. N., and Gupta, R. Cumas: Data transfer aware multi-application scheduling for shared gpus. In Proceedings of the 2016 International Conference on Supercomputing (2016), pp. 1–12. [2] Chaudhary, S., Ramjee, R., Sivathanu, M., Kwatra, N., and Viswanatha, S. Balancing efficiency and fairness in heterogeneous gpu clusters for deep learning. In Proceedings of the Fifteenth European Conference on Computer Systems (2020), pp. 1–16. [3] Chen, Q., Yang, H., Mars, J., and Tang, L. Baymax: Qos awareness and increased utilization for non-preemptive accelerators in warehouse scale computers. ACM SIGPLAN Notices 51, 4 (2016), 681–696. [4] Li, B., Gadepally, V., Samsi, S., and Tiwari, D. Characterizing multi-instance gpu for machine learning workloads. In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (2022), IEEE, pp. 724–731. [5] Li, B., Patel, T., Samsi, S., Gadepally, V., and Tiwari, D. Miso: exploiting multi-instance gpu capability on multi-tenant gpu clusters. In Proceedings of the 13th Symposium on Cloud Computing (2022), pp. 173–189. [6] Porter, C., Chen, C., and Pande, S. Compiler-assisted scheduling for multi-instance gpus. In Proceedings of the 14th Workshop on General Purpose Processing Using GPU (2022), pp. 1–6. [7] Shen, H., Chen, L., Jin, Y., Zhao, L., Kong, B., Philipose, M., Krishnamurthy, A., and Sundaram, R. Nexus: A gpu cluster engine for accelerating dnn-based video analysis. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (2019), pp. 322–337. [8] Tan, C., Li, Z., Zhang, J., Cao, Y., Qi, S., Liu, Z., Zhu, Y., and Guo, C. Serving dnn models with multi-instance gpus: A case of the reconfigurable machine scheduling problem. arXiv preprint arXiv:2109.11067 (2021). [9] Van Heeswijk, M., Miche, Y., Oja, E., and Lendasse, A. Gpu accelerated and parallelized elm ensembles for large-scale regression. Neurocomputing 74, 16 (2011), 2430–2437. [10] Weng, Q., Xiao, W., Yu, Y., Wang, W., Wang, C., He, J., Li, Y., Zhang, L., Lin, W., and Ding, Y. {MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22) (2022), pp. 945–960. [11] Yang, Z., Wu, H., Xu, Y., Wu, Y., Zhong, H., and Zhang, W. Hydra: Deadline-aware and efficiency-oriented scheduling for deep learning jobs on heterogeneous gpus. IEEE Transactions on Computers (2023). [12] Zhang, H., Li, Y., Xiao, W., Huang, Y., Di, X., Yin, J., See, S., Luo, Y., Lau, C. T., and You, Y. Migperf: A comprehensive benchmark for deep learning training and inference workloads on multi-instance gpus. arXiv preprint arXiv:2301.00407 (2023).
|