|
[1] F. C. Chow, S. Chan, S.-M. Liu, R. Lo, and M. Streich, “E↵ective representation of aliases and indirect memory operations in ssa form,” in Proceedings of the 6th International Conference on Compiler Construc- tion,ser.CC’96. London,UK,UK:Springer-Verlag,1996,pp.253–267. [Online]. Available: http://dl.acm.org/citation.cfm?id=647473.760381 [2] D. Novillo, “Memory ssa-a unified approach for sparsely representing memory operations,” in Proc of the GCC Developers Summit. Citeseer, 2007. [3] CUDA C Programming Guide, NVIDIA, 2016. [Online]. Available: http://docs.nvidia.com/cuda/cuda-c-programming-guide/ [4] The OpenCL Specification, version 1.2, OpenCL Working Group, 2012. [Online]. http://www.khronos.org/registry/cl/spec/opencl-1.2.pdf [5] The OpenCL Specification, version 2.0, OpenCL Working Group, 2015. [Online]. https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf Khronos Available: Khronos Available: [6] C++ Accelerated Massive Parallelism, Microsoft Corp. [Online]. Avail- able: http://msdn.microsoft.com/en-us/library/vstudio/hh265137.aspx [7] Compute Shader Overview, Microsoft Corp. [Online]. Available: http://msdn.microsoft.com/en-us/library/↵476331.aspx [8] B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, “Architecting phase change memory as a scalable dram alternative,” SIGARCH Comput. Archit. News, vol. 37, no. 3, pp. 2–13, Jun. 2009. [Online]. Available: http://doi.acm.org/10.1145/1555815.1555758 [9] P. Zhou, B. Zhao, J. Yang, and Y. Zhang, “A durable and energy ecient main memory using phase change memory technology,” SIGARCH Comput. Archit. News, vol. 37, no. 3, pp. 14–23, Jun. 2009. [Online]. Available: http://doi.acm.org/10.1145/1555815.1555759 [10] M. K. Qureshi, V. Srinivasan, and J. A. Rivers, “Scalable high perfor- mance main memory system using phase-change memory technology,” in Proceedings of the 36th Annual International Symposium on Computer Architecture, ser. ISCA ’09. New York, NY, USA: ACM, 2009, pp. 24– 33. [Online]. Available: http://doi.acm.org/10.1145/1555754.1555760 [11] L. E. Ramos, E. Gorbatov, and R. Bianchini, “Page placement in hybrid memory systems,” in Proceedings of the International Conference on Supercomputing,ser.ICS’11. NewYork,NY,USA:ACM,2011,pp.85– 95. [Online]. Available: http://doi.acm.org/10.1145/1995896.1995911 [12] D.-J. Shin, S. K. Park, S. M. Kim, and K. H. Park, “Adaptive page grouping for energy eciency in hybrid pram-dram main memory,” in Proceedings of the 2012 ACM Research in Applied Computation Symposium, ser. RACS ’12. New York, NY, USA: ACM, 2012, pp. 395– 402. [Online]. Available: http://doi.acm.org/10.1145/2401603.2401689 [13] A. Hassan, H. Vandierendonck, and D. S. Nikolopoulos, “Energy- ecient in-memory data stores on hybrid memory hierar- chies,” in Proceedings of the 11th International Workshop on Data Management on New Hardware, ser. DaMoN’15. New York, NY, USA: ACM, 2015, pp. 1:1–1:8. [Online]. Available: http://doi.acm.org/10.1145/2771937.2771940 [14] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, “Eciently computing static single assignment form and the control dependence graph,” ACM Trans. Program. Lang. Syst., vol. 13, no. 4, pp. 451–490, Oct. 1991. [Online]. Available: http://doi.acm.org/10.1145/115372.115320 [15] C. Lattner and V. Adve, “LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation,” in Proceedings of the 2004 Inter- national Symposium on Code Generation and Optimization (CGO’04), Palo Alto, California, Mar 2004. [16] Y. Ho, G. M. Huang, and P. Li, “Nonvolatile memristor memory: Device characteristics and design implications,” in Proceedings of the 2009 International Conference on Computer-Aided Design, ser. ICCAD ’09. New York, NY, USA: ACM, 2009, pp. 485–490. [Online]. Available: http://doi.acm.org/10.1145/1687399.1687491 [17] Open64 compiler, CAPSL. [Online]. Available: http://www.open64.net/ [18] (2016) Accelerated parallel processing (app) sdk. Advanced Micro Devices Inc. [Online]. Available: http://developer.amd.com/tools-and-sdk/heterogeneous- computing/amd-accelerated-parallel-processing-app-sdk [19] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Shea↵er, S.-H. Lee, and K. Skadron, “Rodinia: A benchmark suite for heterogeneous computing,” in Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), ser. IISWC ’09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 44–54. [Online]. Available: http://dx.doi.org/10.1109/IISWC.2009.5306797 [20] A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, “Analyzing cuda workloads using a detailed gpu simulator,” 2009 IEEE International Symposium on Performance Analysis of Systems and Soft- ware, pp. 163–174, 2009. [21] S. Thoziyoor, J. H. Ahn, M. Monchiero, J. B. Brockman, and N. P. Jouppi, “A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies,” in Proceedings of the 35th Annual International Symposium on Computer Architecture, ser. ISCA ’08. Washington, DC, USA: IEEE Computer Society, 2008, pp. 51–62. [Online]. Available: http://dx.doi.org/10.1109/ISCA.2008.16 [22] C. Xu, X. Dong, N. P. Jouppi, and Y. Xie, “Design implications of memristor-basedRRAMcross-pointstructures,”inDATE. IEEE,2011, pp. 734–739. |