|
[1] HSA Foundation, http://www.hsafoundation.com/ [2] Khronos. The OpenCL Specification. https://www.khronos.org/opencl/ [3] Agarwal, N., Krishna, T., Peh, L.-S., and Jha, N. K. “GARNET: A detailed on-chip network model inside a full-system simulator.” In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Apr. 2009, pp. 33–42. [4] Macsim, https://code.google.com/p/macsim/ [5] V. Zakharenko, T. Aamodt, and A. Moshovos. “Characterizing the performance benefits of fused CPU/GPU systems using fusionism.” In Proceedings of DATE, 2013. [6] R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. “Multi2sim: A simulation framework for CPU-GPU computing.” In Proceedings of PACT, 2012. [7] NVIDIA, “NVIDIA CUDA C Programming Guide Ver. 4.0,” 2011. [8] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. “The gem5 simulator.” SIGARCH Computer Architecture News, vol. 39, no. 2, pp. 1-7, 2011. [9] Multi2Sim OpenCL SDK. https://www.multi2sim.org/benchmarks/amdapp-2.5.php [10] P. Rogers. “Heterogeneous system architecture overview.” In Proceedings of Hot Chips 25, 2013. [11] S. Che, M. Boyer, J. Meng, et al. “A performance study of general purpose applications on graphics processors using CUDA.” Journal of Parallel and Distributed Computing, 2008.
|