|
[1] M. Pak and S. Kim, “A review of deep learning in image recognition,” in Proceedings of the 2017 4th International Conference on Computer Applications and Information Processing Technology, CAIPT 2017, vol. 2018-January, 2018. [2] D. W. Otter, J. R. Medina, and J. K. Kalita, “A Survey of the Usages of Deep Learning for Natural Language Processing,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 2, 2021. [3] M. Bojarski, L. Jackel, B. Firner, and U. Muller, “Explaining How End-to-End Deep Learning Steers a Self-Driving Car,” Nvidia, 2017. [4] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory,” in Proceedings - 2016 43rd International Symposium on Computer Architecture, ISCA 2016, 2016. [5] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, “ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars,” in Proceedings - 2016 43rd International Symposium on Computer Architecture, ISCA 2016, 2016. [6] F. Zahoor, T. Z. Azni Zulkifli, and F. A. Khanday, “Resistive Random Access Memory (RRAM): an Overview of Materials, Switching Mechanism, Performance, Multilevel Cell (mlc) Storage, Modeling, and Applications,” 2020. [7] T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy, “TVM: An automated end-to-end optimizing compiler for deep learning,” in Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, 2007. [8] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: A system for large-scale machine learning,” in Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, 2016. [9] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems, vol. 32, 2019. [10] T. Chen, M. Li, U. W. Cmu, Y. Li, M. Lin, N. Wang, M. Wang, B. Xu, C. Zhang, Z. Zhang, and U. Alberta, “MXNet : A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems arXiv : 1512 . 01274v1 [ cs . DC ] 3 Dec 2015,” Emerald Group Publishing Limited, vol. 36, no. 2, 2015. [11] J. Bai, F. Lu, and K. Zhang, “ONNX: Open Neural Network Exchange,” GitHub repository, 2019. [12] T. Chen, L. Zheng, E. Yan, Z. Jiang, T. Moreau, L. Ceze, C. Guestrin, and A. Krishnamurthy, “Learning to optimize tensor programs,” in Advances in Neural Information Processing Systems, vol. 2018-December, 2018. [13] L. Zheng, C. Jia, M. Sun, Z. Wu, C. H. Yu, A. Haj-Ali, Y. Wang, J. Yang, D. Zhuo, K. Sen, J. E. Gonzalez, and I. Stoica, “Ansor: Generating high-performance tensor programs for deep learning,” in Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, 2020. [14] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding,” in 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings, 2016. [15] H. Ji, L. Song, L. Jiang, H. H. Li, and Y. Chen, “Recom: An efficient resistive accelerator for compressed deep neural networks,” in Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018, vol. 2018-January, 2018. [16] C. Y. Tsai, C. F. Nien, T. C. Yu, H. Y. Yeh, and H. Y. Cheng, “RePIM: Joint Exploitation of Activation and Weight Repetitions for In-ReRAM DNN Acceleration,” in Proceedings - Design Automation Conference, vol. 2021-December, 2021. [17] A. Drebes, L. Chelini, O. Zinenko, A. Cohen, H. Corporaal, T. Grosser, K. Vadivel, and N. Vasilache, “TC-CIM: Empowering Tensor Comprehensions for Computing-InMemory,” in IMPACT 2020 workshop (associated with HIPEAC 2020), 2020. [18] N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. Devito, W. S. Moses, S. Verdoolaege, A. Adams, and A. Cohen, “The next 700 accelerated layers: From mathematical expressions of network computation graphs to accelerated GPU kernels, automatically,” ACM Transactions on Architecture and Code Optimization, vol. 16, no. 4, 2019. [19] S. Verdoolaege, S. Guelton, T. Grosser, and A. Cohen, “Schedule Trees,” in International Workshop on Polyhedral Compilation Techniques, no. January 2014, 2014. [20] A. Siemieniuk, L. Chelini, A. A. Khan, J. Castrillon, A. Drebes, H. Corporaal, T. Grosser, and M. Kong, “OCC: An Automated End-to-End Machine Learning Optimizing Compiler for Computing-In-Memory,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 6, 2022. [21] A. Vasudevan, A. Anderson, and D. Gregg, “Parallel Multi Channel convolution using General Matrix Multiplication,” in Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors, 2017. [22] A. Sebastian, M. Le Gallo, and E. Eleftheriou, “Computational phase-change memory: Beyond von Neumann computing,” 2019. [23] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured sparsity in deep neural networks,” in Advances in Neural Information Processing Systems, 2016. [24] J. Lin, Z. Zhu, Y. Wang, and Y. Xie, “Learning the sparsity for RERAM: Mapping and pruning sparse neural network for ReRAM based accelerator,” in Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC, 2019. [25] “Example Compilation Flow.” [Online]. Available: https://tvm.apache.org/docs/arch/index.html [26] M. Y. Lin, H. Y. Cheng, W. T. Lin, T. H. Yang, I. C. Tseng, C. L. Yang, H. W. Hu, H. S. Chang, H. P. Li, and M. F. Chang, “DL-RSIM: A simulation framework to enable reliable ReRAM-based accelerators for deep learning,” in IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, 2018. [27] B. Skienard, P. Blaise, B. Traore, A. Dragoni, C. Nail, and E. Vianello, “Advances in the understanding of microscopic switching mechanisms in ReRAM devices (Invited paper),” in European Solid-State Device Research Conference, 2017. [28] A. Jain, S. Bhattacharya, M. Masuda, V. Sharma, and Y. Wang, “Efficient Execution of Quantized Deep Learning Models: A Compiler Approach,” ArXiv, vol. abs/2006.10226, 2020. [29] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and Training of Neural Networks for Efficient IntegerArithmetic-Only Inference,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018. [30] “Quantization aware training.” [Online]. Available: https://www.tensorflow.org/model_optimization/guide/quantization/training [31] “Post-training quantization.” [Online]. Available: https://www.tensorflow.org/model_optimization/guide/quantization/post_training [32] Z. Chen, C. H. Yu, T. Morris, J. Tuyls, Y.-H. Lai, J. Roesch, E. Delaye, V. Sharma, and Y. Wang, “Bring Your Own Codegen to Deep Learning Compiler,” 5 2021. [33] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [34] L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012. |