|
[1] Abadi, M., and et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org. [2] Alibaba Cloud. GPU Sharing Scheduler Extender in Kubernetes. [Online]. Available: https://github.com/AliyunContainerService/gpushare-scheduler-extender. [3] Basaran, C., and Kang, K. Supporting preemptive task executions and memory copies in GPGPUs. In Euromicro Conference on Real-Time Systems (July 2012), pp. 287–296. [4] Becchi, M., Sajjapongse, K., Graves, I., Procter, A., Ravi, V., and Chakradhar, S. A Virtual Memory Based Runtime to Support Multi-Tenancy in Clusters with GPUs. In Proceedings of the 21st International Symposium on HighPerformance Parallel and Distributed Computing (2012), p. 97‒108. [5] Belkin, M., Haas, R., Arnold, G. W., Leong, H. W., Huerta, E. A., Lesny, D., and Neubauer, M. Container solutions for hpc systems: A case study of using shifter on blue waters. In Proceedings of the Practice and Experience on Advanced Research Computing (2018). [6] Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. CoRR (2016). [7] Deepomatic. A shared GPU Nvidia K8S device plugin. [Online]. Available: https://github.com/Deepomatic/shared-gpu-nvidia-k8s-device-plugin. [8] Giunta, G., Montella, R., Agrillo, G., and Coviello, G. A GPGPU Transparent Virtualization Component for High Performance Computing Clouds. In Proceedings of the 16th International Euro-Par Conference on Parallel Processing (2010), Springer-Verlag, p. 379‒391. [9] Google. Kubernetes cluster management. [Online]. Available: http://kubernetes.io/. [10] Gu, J., Song, S., Li, Y., and Luo, H. GaiaGPU: Sharing GPUs in Container Clouds. In 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications (Dec 2018), pp. 469–476. [11] Gupta, V., Schwan, K., Tolia, N., Talwar, V., and Ranganathan, P. Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In USENIX Annual Technical Conference (2011), p. 3. [12] He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. CoRR (2015). [13] Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A. D., Katz, R., Shenker, S., and Stoica, I. Mesos: A platform for fine-grained resource sharing in the data center. In USENIX Conference on Networked Systems Design and Implementation (2011), pp. 295–308. [14] Intel. Intel device plugin for kubernetes. [Online]. Available: https://github.com/intel/intel-device-plugins-for-kubernetes. [15] Kang, D., Jun, T. J., Kim, D., Kim, J., and Kim, D. ConVGPU: GPU Management Middleware in Container Based Virtualized Environment. In IEEE International Conference on Cluster Computing (Sep. 2017), pp. 301–309. [16] Kang, H., Le, M., and Tao, S. Container and microservice driven design for cloud infrastructure devops. In 2016 IEEE International Conference on Cloud Engineering (IC2E) (April 2016), pp. 202–211. [17] Kato, S., Lakshmanan, K., Rajkumar, R., and Ishikawa, Y. Timegraph: Gpu scheduling for real-time multi-tasking environments. In USENIX Annual Technical Conference (USA, 2011), USENIX Association, p. 2. [18] Kato, S., McThrow, M., Maltzahn, C., and Brandt, S. Gdev: First-class GPU resource management in the operating system. In USENIX Annual Technical Conference (USA, 2012), USENIX Association, p. 37. [19] Kehne, J., Metter, J., and Bellosa, F. GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (New York, NY, USA, 2015), VEE ’15, Association for Computing Machinery, p. 65‒77. [20] Kurtzer, G. M., Sochat, V., and Bauer, M. W. Singularity: Scientific containers for mobility of compute. PLOS ONE 12, 5 (05 2017), 1–20. [21] Merkel, D. Docker: Lightweight linux containers for consistent development and deployment. Linux J. 2014, 239 (Mar. 2014). [22] Nvidia. NVIDIA GPU Device Plugin for Kubernetes. [Online]. Available: https://github.com/NVIDIA/k8s-device-plugin/. [23] Peña, A. J., Reaño, C., Silla, F., Mayo, R., Quintana-Ortí, E. S., and Duato, J. A complete and efficient cuda-sharing solution for hpc clusters. Parallel Comput. 40, 10 (Dec. 2014), 574‒588. [24] Rossbach, C., Currey, J., Silberstein, M., Ray, B., and Witchel, E. PTask: Operating System Abstractions To Manage GPUs as Compute Devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (October 2011), p. 233‒248. [25] Sengupta, D., Belapure, R., and Schwan, K. Multi-Tenancy on GPGPU-Based Servers. In Proceedings of the 7th International Workshop on Virtualization Technologies in Distributed Computing (2013), p. 3‒10. [26] Suzuki, Y., Yamada, H., Kato, S., and Kono, K. GLoop: An Event-Driven Runtime for Consolidating GPGPU Applications. In Symposium on Cloud Computing (2017), p. 80‒93. [27] Tencent. Rdma device plugin for kubernetes. [Online]. Available: https://github.com/hustcat/k8s-rdma-device-plugin. [28] Ting-An Yeh, Jerry Chou. Kubeshare. [Online]. Available: https://github.com/NTHU-LSALAB/KubeShare/. [29] Vavilapalli, V. K., et al. Apache hadoop yarn: Yet another resource negotiator. In Symposium on Cloud Computing (2013), pp. 5:1–5:16. [30] Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., and Wilkes, J. Large-scale Cluster Management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems (2015), pp. 18:1–18:17. [31] Wu, B., Liu, X., Zhou, X., and Jiang, C. Flep: Enabling flexible and efficient preemption on gpus. In International Conference on Architectural Support for Programming Languages and Operating Systems (2017), p. 483‒496. [32] Xue, M., Tian, K., Dong, Y., Ma, J., Wang, J., Qi, Z., He, B., and Guan, H. gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space. In USENIX Annual Technical Conference (June 2016), pp. 579–590. [33] Younge, A. J., Pedretti, K., Grant, R. E., Gaines, B. L., and Brightwell, R. Enabling diverse software stacks on supercomputers using high performance virtual clusters. In IEEE International Conference on Cluster Computing (Sep. 2017), pp. 310–321. |