|
[1] J. Cocke, “Global common subexpression elimination,” in ACM Sigplan Notices, vol. 5, no. 7. ACM, 1970, pp. 20–24. [2] J. Knoop, O. Ru ̈thing, and B. Ste↵en, Partial dead code elimination. ACM, 1994, vol. 29, no. 6. [3] M. N. Wegman and F. K. Zadeck, “Constant propagation with condi- tional branches,” ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 13, no. 2, pp. 181–210, 1991. [4] D. Callahan, K. D. Cooper, K. Kennedy, and L. Torczon, “Interproce- dural constant propagation,” in ACM SIGPLAN Notices, vol. 21, no. 7. ACM, 1986, pp. 152–161. [5] J. E. Stone, D. Gohara, and G. Shi, “Opencl: A parallel programming standard for heterogeneous computing systems,” Computing in science & engineering, vol. 12, no. 3, pp. 66–73, 2010. [6] J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with cuda,” Queue, vol. 6, no. 2, pp. 40–53, 2008. 39 BIBLIOGRAPHY 40 [7] S. Wienke, P. Springer, C. Terboven, and D. an Mey, “Openaccfirst experiences with real-world applications,” in European Conference on Parallel Processing. Springer, 2012, pp. 859–870. [8] C. Lattner and V. Adve, “Llvm: A compilation framework for lifelong program analysis & transformation,” in Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization. IEEE Computer Society, 2004, p. 75. [9] A. Bakhoda, G. L. Yuan, W. W. Fung, H. Wong, and T. M. Aamodt, “Analyzing cuda workloads using a detailed gpu simulator,” in Perfor- mance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on. IEEE, 2009, pp. 163–174. [10] J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, “Gpuwattch: enabling energy optimizations in gpgpus,” in ACM SIGARCH Computer Architecture News, vol. 41, no. 3. ACM, 2013, pp. 487–498. [11] K. O. W. Group. (2017, May) The opencl c++ 1.0 specification. [Online]. Available: https://www.khronos.org/registry/OpenCL/specs/opencl- 2.2-cplusplus.html [12] (2015) Compubench cl. COMPUBENCHCL. [Online]. Available: https://compubench.com [13] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Ca↵e: Convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014, pp. 675–678. [14] K. Krommydas, W.-c. Feng, C. D. Antonopoulos, and N. Bellas, “Opendwarfs: Characterization of dwarf-based benchmarks on fixed and reconfigurable architectures,” Journal of Signal Processing Systems, vol. 85, no. 3, pp. 373–392, 2016. [15] Nvidia, “Ptx: Parallel thread execution isa,” 2015. [Online]. Available: http://docs.nvidia.com/cuda/parallel-thread-execution/ |