|
[1] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. {TensorFlow}: a system for {Large- Scale} machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16) (2016), pp. 265–283. [2] Brutzkus, A., and Globerson, A. Why do larger models generalize better? A theoretical perspective via the XOR problem. In Proceedings of the 36th International Conference on Machine Learning (09–15 Jun 2019), K. Chaud- huri and R. Salakhutdinov, Eds., vol. 97 of Proceedings of Machine Learning Research, PMLR, pp. 822–830. [3] Hu,Z.,Xiao,J.,Deng,Z.,Li,M.,Zhang,K.,Zhang,X.,Meng,K.,Sun,N.,and Tan, G. Megtaichi: Dynamic tensor-based memory management optimization for dnn training. In Proceedings of the 36th ACM International Conference on Supercomputing (2022), pp. 1–13. [4] Huang, C.-C., Jin, G., and Li, J. Swapadvisor: Pushing deep learning beyond the gpu memory limit via smart swapping. In Proceedings of the Twenty- Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (2020), pp. 1341–1355. [5] Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, D., Chen, M., Lee, H., Ngiam, J., Le, Q. V., Wu, Y., et al. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information process- ing systems 32 (2019). [6] Imambi, S., Prakash, K. B., and Kanagachidambaresan, G. Pytorch. Program- ming with TensorFlow: Solution for Edge Computing Applications (2021), 87– 104. [7] Jain, A., Phanishayee, A., Mars, J., Tang, L., and Pekhimenko, G. Gist: Effi- cient data encoding for deep neural network training. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA) (2018), IEEE, pp. 776–789. [8] Jain,P.,Jain,A.,Nrusimha,A.,Gholami,A.,Abbeel,P.,Gonzalez,J.,Keutzer, K., and Stoica, I. Checkmate: Breaking the memory wall with optimal tensor rematerialization. Proceedings of Machine Learning and Systems 2 (2020), 497–511. [9] Liao,J.,Li,M.,Sun,Q.,Hao,J.,Yu,F.,Chen,S.,Tao,Y.,Zhang,Z.,Yang,H., Luan, Z., et al. Mimose: An input-aware checkpointing planner for efficient training on gpu. arXiv preprint arXiv:2209.02478 (2022). [10] Nie,X.,Miao,X.,Yang,Z.,andCui,B.Tsplit:Fine-grainedgpumemoryman- agement for efficient dnn training via tensor splitting. In 2022 IEEE 38th In- ternational Conference on Data Engineering (ICDE) (2022), IEEE, pp. 2615– 2628. [11] Peng, X., Shi, X., Dai, H., Jin, H., Ma, W., Xiong, Q., Yang, F., and Qian, X. Capuchin: Tensor-based gpu memory management for deep learning. In Pro- ceedings of the Twenty-Fifth International Conference on Architectural Sup- port for Programming Languages and Operating Systems (2020), pp. 891–905. [12] Ren, J., Rajbhandari, S., Aminabadi, R. Y., Ruwase, O., Yang, S., Zhang, M., Li, D., and He, Y. Zero-offload: Democratizing billion-scale model training, 2021. [13] Rhu, M., Gimelshein, N., Clemons, J., Zulfiqar, A., and Keckler, S. W. vdnn: Virtualized deep neural networks for scalable, memory-efficient neural net- work design. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2016), IEEE, pp. 1–13. [14] Shah,A.,Wu,C.-Y.,Mohan,J.,Chidambaram,V.,andKrähenbühl,P.Memory optimization for deep networks, 2021. [15] Shoeybi,M.,Patwary,M.,Puri,R.,LeGresley,P.,Casper,J.,andCatanzaro,B. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053 (2019). [16] Steiner, B., Elhoushi, M., Kahn, J., and Hegarty, J. Olla: Optimizing the life- time and location of arrays to reduce the memory usage of neural networks. arXiv preprint arXiv:2210.12924 (2022). [17] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. Going deeper with convolutions. In Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2015). [18] Tang, Y., Wang, C., Zhang, Y., Liu, Y., Zhang, X., Qiao, L., Lai, Z., and Li, D. Delta: Dynamically optimizing gpu memory beyond tensor recomputation. arXiv preprint arXiv:2203.15980 (2022). [19] Touvron, H., Cord, M., Sablayrolles, A., Synnaeve, G., and Jégou, H. Going deeper with image transformers. In Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision (ICCV) (October 2021), pp. 32–42. [20] Wang, L., Ye, J., Zhao, Y., Wu, W., Li, A., Song, S. L., Xu, Z., and Kraska, T. Superneurons: Dynamic gpu memory management for training deep neural networks. In Proceedings of the 23rd ACM SIGPLAN symposium on principles and practice of parallel programming (2018), pp. 41–53. [21] Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. Understanding deep learning requires rethinking generalization, 2017. [22] Zhao, X., Le Hellard, T., Eyraud-Dubois, L., Gusak, J., and Beaumont, O. Rockmate: an efficient, fast, automatic and generic tool for re-materialization in pytorch. In International Conference on Machine Learning (2023), PMLR, pp. 42018–42045. [23] Zhu, F., Gong, R., Yu, F., Liu, X., Wang, Y., Li, Z., Yang, X., and Yan, J. Towards unified int8 training for convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 1969–1979. |