|
[1]P. Briggs, K. D. Cooper, K. Kennedy, and L. Torczon. Coloring heuris- tics for register allocation. SIGPLAN Not., 39(4):283–294, Apr. 2004. [2] G. Chaitin. Register allocation and spilling via graph coloring. SIG- PLAN Not., 39(4):66–74, Apr. 2004. [3] S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pages 44–54, Oct 2009. [4] G. F. Diamos, A. R. Kerr, S. Yalamanchili, and N. Clark. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. In Proceedings of the 19th International Confer- ence on Parallel Architectures and Compilation Techniques, PACT ’10, pages 353–364, New York, NY, USA, 2010. ACM. [5] M. Gebhart, D. R. Johnson, D. Tarjan, S. W. Keckler, W. J. Dally, E. Lindholm, and K. Skadron. Energy-efficient mechanisms for manag- ing thread context in throughput processors. SIGARCH Comput. Archit. News, 39(3):235–246, June 2011. [6] M. Gebhart, S. W. Keckler, and W. J. Dally. A compile-time managed multi-level register file hierarchy. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. [7] S. Hong and H. Kim. An integrated gpu power and performance model. In Proceedings of the 37th Annual International Symposium on Com- puter Architecture. [8] KHRONOS. Opencl: Open computing language version 1.2. https: //www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml, 2014. [9] NVIDIA. Cuda: Compute united device architectureprogramming guide version 2.0. http://developer.download.nvidia.com/compute/ cuda/2_0/docs/NVIDIA_CUDA_Programming_Guide_2.0.pdf, 2008. [10] NVIDIA. Nvidia fermi compute architecture whitepaper. http://www.nvidia.com.tw/content/PDF/fermi_white_papers/ NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf, 2009. [11] NVIDIA. Nvidia kepler architecture whitepaper. http://www.nvidia.com.tw/content/apac/pdf/tesla/ nvidia-kepler-gk110-architecture-whitepaper-tw.pdf, 2012. [12] NVIDIA. Ptx: Parallel thread execution isa version 2.3, 2014. [13] M. Poletto and V. Sarkar. Linear scan register allocation. ACM Trans. Program. Lang. Syst., 21(5):895–913, Sept. 1999. |