|
[1] Chetan Nayak et al. “Non-Abelian anyons and topological quantum computation”. In: Reviews of Modern Physics 80.3 (2008), 1083–1159. issn: 1539-0756. doi: 10.1103/revmodphys.80.1083. url: http://dx.doi.org/10.1103/ RevModPhys.80.1083. [2] Hans-Joachim Werner et al. “Molpro: a general-purpose quantum chemistry program package”. In: WIREs Computational Molecular Science 2.2 (2012), pp. 242–253. doi: https : / / doi . org / 10 . 1002 / wcms . 82. eprint: https : / / onlinelibrary . wiley . com / doi / pdf / 10 . 1002 / wcms . 82. url: https : //onlinelibrary.wiley.com/doi/abs/10.1002/wcms.82. [3] M. Alex O. Vasilescu and Demetri Terzopoulos. “Multilinear Analysis of Image Ensembles: TensorFaces”. In: Computer Vision — ECCV 2002. Ed. by Anders Heyden et al. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002, pp. 447–460. isbn: 978-3-540-47969-7. [4] Alexander Novikov et al. Tensorizing Neural Networks. 2015. arXiv: 1509.06569 [cs.LG]. [5] Tamara G. Kolda and Brett W. Bader. “Tensor Decompositions and Applications”. In: SIAM Review 51.3 (2009), pp. 455–500. doi: 10.1137/07070111X. eprint: https://doi.org/10.1137/07070111X. url: https://doi.org/10. 1137/07070111X. [6] So Hirata. “Tensor Contraction Engine: Abstraction and Automated Parallel Implementation of Configuration-Interaction, Coupled-Cluster, and Many-Body 75 Perturbation Theories”. In: The Journal of Physical Chemistry A 107 (Nov. 2003), pp. 9887–9897. doi: 10.1021/jp034596z. [7] A. Abdelfattah et al. “High-performance Tensor Contractions for GPUs”. In: Procedia Computer Science 80 (2016). International Conference on Computational Science 2016, ICCS 2016, 6-8 June 2016, San Diego, California, USA, pp. 108 –118. issn: 1877-0509. doi: https://doi.org/10.1016/j.procs. 2016.05.302. url: http://www.sciencedirect.com/science/article/pii/ S1877050916306536. [8] Edgar Solomonik et al. “A massively parallel tensor contraction framework for coupled-cluster computations”. In: Journal of Parallel and Distributed Computing 74.12 (2014). Domain-Specific Languages and High-Level Frameworks for High-Performance Computing, pp. 3176 –3190. issn: 0743-7315. doi: https:// doi.org/10.1016/j.jpdc.2014.06.002. url: http://www.sciencedirect. com/science/article/pii/S074373151400104X. [9] Yang Shi et al. “Tensor Contractions with Extended BLAS Kernels on CPU and GPU”. In: 2016 IEEE 23rd International Conference on High Performance Computing (HiPC) (2016). doi: 10.1109/hipc.2016.031. url: http://dx. doi.org/10.1109/HiPC.2016.031. [10] Paul Springer and Paolo Bientinesi. “Design of a High-Performance GEMM-like Tensor–Tensor Multiplication”. In: ACM Trans. Math. Softw. 44.3 (Jan. 2018). issn: 0098-3500. doi: 10.1145/3157733. url: https://doi.org/10.1145/ 3157733. [11] Devin A. Matthews. “High-Performance Tensor Contraction without Transposition”. In: SIAM Journal on Scientific Computing 40.1 (2018), pp. C1–C24. doi: 10.1137/16M108968X. eprint: https://doi.org/10.1137/16M108968X. url: https://doi.org/10.1137/16M108968X. 76 [12] Dmitry I. Lyakh. “An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU”. In: Computer Physics Communications 189 (Jan. 2015). doi: 10.1016/j.cpc.2014.12.013. [13] Antti-Pekka Hynninen and Dmitry I. Lyakh. cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs. 2017. arXiv: 1705.01598 [cs.MS]. [14] Paul Springer, Tong Su, and Paolo Bientinesi. “HPTT: A High-Performance Tensor Transposition C++ Library”. In: ARRAY 2017. Barcelona, Spain: Association for Computing Machinery, 2017, 56–62. isbn: 9781450350693. doi: 10. 1145/3091966.3091968. url: https://doi.org/10.1145/3091966.3091968. [15] J. Vedurada et al. “TTLG - An Efficient Tensor Transposition Library for GPUs”. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 2018, pp. 578–588. doi: 10.1109/IPDPS.2018.00067. [16] NVIDIA Multi-Instance GPU User Guide. https://docs.nvidia.com/datacenter/tesla/miguser-guide/index.html. 2022. [17] Fred Gustavson, Lars Karlsson, and Bo Kågström. “Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion”. In: 38.3 (Apr. 2012). issn: 0098- 3500. doi: 10.1145/2168773.2168775. url: https://doi.org/10.1145/ 2168773.2168775. [18] Fred G. Gustavson and David W. Walker. “Algorithms for in-place matrix transposition”. In: Concurrency and Computation: Practice and Experience 31.13 (2019). e5071 cpe.5071, e5071. doi: 10 . 1002 / cpe . 5071. eprint: https : / / onlinelibrary . wiley . com / doi / pdf / 10 . 1002 / cpe . 5071. url: https : //onlinelibrary.wiley.com/doi/abs/10.1002/cpe.5071. [19] I-Jui Sung et al. “In-Place Transposition of Rectangular Matrices on Accelerators”. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP ’14. Orlando, Florida, USA: Associ77 ation for Computing Machinery, 2014, 207–218. isbn: 9781450326568. doi: 10. 1145/2555243.2555266. url: https://doi.org/10.1145/2555243.2555266. [20] J. Gómez-Luna et al. “In-Place Matrix Transposition on GPUs”. In: IEEE Transactions on Parallel and Distributed Systems 27.3 (2016), pp. 776–788. doi: 10.1109/TPDS.2015.2412549. [21] Bryan Catanzaro, Alexander Keller, and Michael Garland. “A Decomposition for In-Place Matrix Transposition”. In: SIGPLAN Not. 49.8 (Feb. 2014), 193–206. issn: 0362-1340. doi: 10.1145/2692916.2555253. url: https://doi.org/ 10.1145/2692916.2555253. [22] A.A. Tretyakov and E.E. Tyrtyshnikov. “Optimal in-place transposition of rectangular matrices”. In: Journal of Complexity 25.4 (2009), pp. 377 –384. issn: 0885-064X. doi: https://doi.org/10.1016/j.jco.2009.02.008. url: http: //www.sciencedirect.com/science/article/pii/S0885064X09000120. [23] Fred Gehrung Gustavson and John A Gunnels. “Method and structure for cache aware transposition via rectangular subsections”. In: (Feb. 2014). [24] Jose L. Jodra, Ibai Gurrutxaga, and Javier Muguerza. “Efficient 3D Transpositions in Graphics Processing Units”. In: Int. J. Parallel Program. 43.5 (Oct. 2015), 876–891. issn: 0885-7458. doi: 10 . 1007 / s10766 - 015 - 0366 - 5. url: https://doi.org/10.1007/s10766-015-0366-5. [25] Muhammad Elsayed, Saleh El-shehaby, and Mohamed Abougabal. “NDPA: A generalized efficient parallel in-place N-Dimensional Permutation Algorithm”. In: Alexandria Engineering Journal 32 (Apr. 2015). doi: 10 . 1016 / j . aej . 2015.03.024. [26] Chih-Chieh Tu. “IT3: In-place Transposition of Third-Order Tensor on Graphics Processing Units”. 2021. [27] Paul Springer, Aravind Sankaran, and Paolo Bientinesi. “TTC: a tensor transposition compiler for multiple architectures”. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for 78 Array Programming - ARRAY 2016 (2016). doi: 10.1145/2935323.2935328. url: http://dx.doi.org/10.1145/2935323.2935328. |