|
[1] Chen, Q., Lee, H., Yeom, H. Y., and Son, Y. Flexgpu: A flexible and efficient scheduler for gpu sharing systems. In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID) (2020), pp. 300–309. [2] Chien, S., Peng, I., and Markidis, S. Performance evaluation of advanced features in cuda unified memory. In 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC) (2019), pp. 50–57. [3] Ganguly, D., Zhang, Z., Yang, J., and Melhem, R. Interplay between hardware prefetcher and page eviction policy in cpu-gpu unified virtual memory. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture(ISCA) (2019), pp. 224–235. [4] Ganguly, D., Zhang, Z., Yang, J., and Melhem, R. Adaptive page migration for irregular data-intensive applications under gpu memory oversubscription. In 2020 IEEE International Parallel and Distributed Processing Symposium(IPDPS) (2020), pp. 451–461. [5] Gonthier, M., Marchal, L., and Thibault, S. Memory-aware scheduling of tasks sharing data on multiple gpus with dynamic runtime systems. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2022), pp. 694–704. [6] He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition, 2015. [7] Huang, G., Liu, Z., van der Maaten, L., and Weinberger, K. Q. Densely connected convolutional networks, 2018. [8] Landaverde, R., Zhang, T., Coskun, A. K., and Herbordt, M. An investigation of unified memory access performance in cuda. In 2014 IEEE High Performance Extreme Computing Conference (HPEC) (2014), pp. 1–6. [9] Li, W., Jin, G., Cui, X., and See, S. An evaluation of unified memory technology on nvidia gpus. In 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (2015), pp. 1092–1098. [10] Linux. Linux manual page. https://man7.org/linux/man-pages/man8/ld.so.8.html. [11] NVIDIA. NVIDIA cuda-samples. https://github.com/NVIDIA/cuda-samples. [12] NVIDIA. NVIDIA nvidia-smi. https://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf. [13] Simonyan, K., and Zisserman, A. Very deep convolutional networks for large-scale image recognition, 2015. [14] Yu, Q., Childers, B., Huang, L., Qian, C., and Wang, Z. A quantitative evaluation of unified memory in gpus. The Journal of Supercomputing 76, 4 (2020), 2958–2985. |