|
[1] Chetan Nayak, Steven H. Simon, Ady Stern, Michael Freedman, and SankarDas Sarma. Non-abelian anyons and topological quantum computation.Re-views of Modern Physics, 80(3):1083–1159, Sep 2008. [2] Hans-Joachim Werner, Peter J. Knowles, Gerald Knizia, Frederick R. Manby,and Martin Sch ̈utz. Molpro: a general-purpose quantum chemistry programpackage.WIREs Computational Molecular Science, 2(2):242–253, 2012. [3] M. Alex O. Vasilescu and Demetri Terzopoulos. Multilinear analysis of imageensembles: TensorFaces. In Anders Heyden, Gunnar Sparr, Mads Nielsen,and Peter Johansen, editors,Computer Vision — ECCV 2002, pages 447–460,Berlin, Heidelberg, 2002. Springer Berlin Heidelberg. [4] Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, and Dmitry Vetrov.Tensorizing neural networks, 2015. [5] Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications.SIAM Review, 51(3):455–500, 2009. [6] So Hirata. Tensor contraction engine: Abstraction and automated parallelimplementation of configuration-interaction, coupled-cluster, and many-bodyperturbation theories.The Journal of Physical Chemistry A, 107:9887–9897,11 2003. [7] A. Abdelfattah, M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou,A. Haidar, I. Karlin, Tz. Kolev, I. Masliah, and S. Tomov. High-performancetensor contractions for GPUs.Procedia Computer Science, 80:108 – 118, 2016.International Conference on Computational Science 2016, ICCS 2016, 6-8 June2016, San Diego, California, USA. [8] Edgar Solomonik, Devin Matthews, Jeff R. Hammond, John F. Stanton,and James Demmel. A massively parallel tensor contraction framework forcoupled-cluster computations.Journal of Parallel and Distributed Computing,74(12):3176 – 3190, 2014. Domain-Specific Languages and High-Level Frame-works for High-Performance Computing. [9] Yang Shi, U. N. Niranjan, Animashree Anandkumar, and Cris Cecka. Tensorcontractions with extended BLAS kernels on CPU and GPU.2016 IEEE 23rdInternational Conference on High Performance Computing (HiPC), Dec 2016. [10] Paul Springer and Paolo Bientinesi. Design of a high-performance gemm-liketensor–tensor multiplication.ACM Trans. Math. Softw., 44(3), January 2018. [11] Devin A. Matthews. High-performance tensor contraction without transposi-tion.SIAM Journal on Scientific Computing, 40(1):C1–C24, 2018. [12] Dmitry I. Lyakh. An efficient tensor transpose algorithm for multicore CPU,Intel Xeon Phi, and NVidia Tesla GPU.Computer Physics Communications,189, 1 2015. [13] Antti-Pekka Hynninen and Dmitry I. Lyakh. cuTT: A high-performance tensortranspose library for CUDA compatible GPUs, 2017. [14] Paul Springer, Tong Su, and Paolo Bientinesi. Hptt: A high-performance tensortransposition c++ library. ARRAY 2017, page 56–62, New York, NY, USA,2017. Association for Computing Machinery. [15] J. Vedurada, A. Suresh, A. S. Rajam, J. Kim, C. Hong, A. Panyala, S. Krish-namoorthy, V. K. Nandivada, R. K. Srivastava, and P. Sadayappan. TTLG -an efficient tensor transposition library for GPUs. In2018 IEEE InternationalParallel and Distributed Processing Symposium (IPDPS), pages 578–588, 2018. [16] Guifr ́e Vidal. Efficient classical simulation of slightly entangled quantum com-putations.Physical Review Letters, 91(14), Oct 2003. [17] G. Vidal. Classical simulation of infinite-size quantum lattice systems in onespatial dimension.Physical Review Letters, 98(7), Feb 2007. [18] G. Evenbly and G. Vidal. Algorithms for entanglement renormalization.Phys-ical Review B, 79(14), Apr 2009. [19] Fred Gustavson, Lars Karlsson, and Bo K ̊agstr ̈om. Parallel and cache-efficientin-place matrix storage format conversion. 38(3), April 2012. [20] Fred G. Gustavson and David W. Walker.Algorithms for in-place ma-trix transposition.Concurrency and Computation: Practice and Experience,31(13):e5071, 2019. e5071 cpe.5071. [21] I-Jui Sung, Juan G ́omez-Luna, Jos ́e Mar ́ıa Gonz ́alez-Linares, Nicol ́as Guil, andWen-Mei W. Hwu. In-place transposition of rectangular matrices on accelera-tors. 49(8):207–218, February 2014. [22] J. G ́omez-Luna, I. Sung, L. Chang, J. M. Gonz ́alez-Linares, N. Guil, and W. W.Hwu. In-place matrix transposition on GPUs.IEEE Transactions on Paralleland Distributed Systems, 27(3):776–788, 2016. [23] Bryan Catanzaro, Alexander Keller, and Michael Garland. A decomposition forin-place matrix transposition.SIGPLAN Not., 49(8):193–206, February 2014. [24] Paul Springer, Aravind Sankaran, and Paolo Bientinesi. TTC: a tensor trans-position compiler for multiple architectures.Proceedings of the 3rd ACM SIG-PLAN International Workshop on Libraries, Languages, and Compilers for Ar-ray Programming - ARRAY 2016, 2016. [25] Jose L. Jodra, Ibai Gurrutxaga, and Javier Muguerza. Efficient 3d transpo-sitions in graphics processing units.Int. J. Parallel Program., 43(5):876–891,October 2015. [26] N. Brenner. Algorithm 467: Matrix transposition in place.Commun. ACM,16(11):692–694, November 1973. [27] P. F. Windley. Transposing Matrices in a Digital Computer.The ComputerJournal, 2(1):47–48, 01 1959. [28] A.A. Tretyakov and E.E. Tyrtyshnikov. Optimal in-place transposition of rect-angular matrices.Journal of Complexity, 25(4):377 – 384, 2009. [29] Sunpyo Hong and Hyesoon Kim. An analytical model for a GPU architecturewith memory-level and thread-level parallelism awareness.SIGARCH Comput.Archit. News, 37(3):152–163, June 2009. [30] Jaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, and Richard Vuduc. Aperformance analysis framework for identifying potential benefits in GPGPUapplications.SIGPLAN Not., 47(8):11–22, February 2012. |