|
1] Bai, Z., Zhang, Z., Zhu, Y., and Jin, X. {PipeSwitch}: Fast pipelined context switching for deep learning applications. In 14th USENIX Symposium on Op- erating Systems Design and Implementation (OSDI 20) (2020), pp. 499–514. [2] Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W., Lee, S.-H., and Skadron, K. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE international symposium on workload characterization (IISWC) (2009), Ieee, pp. 44–54. [3] Chen, H.-H., Lin, E.-T., Chou, Y.-M., and Chou, J. Gemini: Enabling multi- tenant gpu sharing based on kernel burst estimation. IEEE Transactions on Cloud Computing (2021). [4] Chien, S., Peng, I., and Markidis, S. Performance evaluation of advanced features in cuda unified memory. In 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC) (2019), pp. 50–57. [5] Duato, J., Pena, A. J., Silla, F., Mayo, R., and Quintana-Ortí, E. S. rcuda: Reducing the number of gpu-based accelerators in high performance clusters. In 2010 International Conference on High Performance Computing & Simu- lation (2010), IEEE, pp. 224–231. [6] Landaverde, R., Zhang, T., Coskun, A. K., and Herbordt, M. An investigation of unified memory access performance in cuda. In 2014 IEEE High Perfor- mance Extreme Computing Conference (HPEC) (2014), IEEE, pp. 1–6. [7] Li, W., Jin, G., Cui, X., and See, S. An evaluation of unified memory tech- nology on nvidia gpus. In 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (2015), pp. 1092–1098. [8] Lim, G., Ahn, J., Xiao, W., Kwon, Y., and Jeon, M. Zico: Efficient {GPU} memory sharing for concurrent {DNN} training. In 2021 USENIX Annual Technical Conference (USENIX ATC 21) (2021), pp. 161–175. [9] Linux. Linux manual page. https://man7.org/linux/man-pages/man8/ld. so.8.html. [10] LocalSolver. LocalSolver. https://www.localsolver.com. [11] NVIDIA. NVIDIA MPS. https://docs.nvidia.com/deploy/mps/index. html. [12] Song, S., Deng, L., Gong, J., and Luo, H. Gaia scheduler: A kubernetes- based scheduler framework. In 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Com- puting & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom) (2018), IEEE, pp. 252–259. [13] Yeh, T.-A., Chen, H.-H., and Chou, J. Kubeshare: A framework to manage gpus as first-class and shared resources in container cloud. In Proceedings of the 29th International Symposium on High-Performance Parallel and Dis- tributed Computing (2020), pp. 173–184. [14] Yu, P., and Chowdhury, M. Salus: Fine-grained gpu sharing primitives for deep learning applications. arXiv preprint arXiv:1902.04610 (2019). [15] Yu, Q., Childers, B., Huang, L., Qian, C., and Wang, Z. A quantitative evalua- tion of unified memory in gpus. The Journal of Supercomputing 76, 4 (2020), 2958–2985.
|