|
[1] Olga Russakovsky, et al., “Imagenet large scale visual recognition challenge.” In International Journal of Computer Vision, 115.3: 211-252, 2015. [2] A. Krizhevsky, et al., “Imagenet classification with deep convolutional neural networks.” In NIPS, 2012. [3] Y. Lecun, et al., "Gradient-based learning applied to document recognition." In Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998. [4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition.” In ICLR, 2015. [5] C. Szegedy et al., "Going deeper with convolutions." In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, doi: 10.1109/CVPR.2015. [6] K. He, et al., “Deep Residual Learning for Image Recognition.” In CVPR, 2016. [7] N. P. Jouppi et al., "In-datacenter performance analysis of a tensor processing unit," In ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 1-12, 2017. [8] E. Lindholm, et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," In IEEE Micro, vol. 28, no. 2, pp. 39-55, 2008. [9] Yann LeCun, et al., “Deep learning.” In Nature,521(7553): 436–444, 2015. [10] Song Han, et al., “Learning both weights and connections for efficient neural networks” In NIPS, 2015. [11] Zhuang Liu et al., "Learning efficient convolutional networks through network slimming." In ICCV, 2017. [12] S. Han et al., "EIE: Efficient Inference Engine on Compressed Deep Neural Network." In ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 243-254, 2016. [13] M. Courbariaux, et al., “Binaryconnect: Training deep neural networks with binary weights during propagations.” In NIPS, 2015. [14] S. Zhou, et al., “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients.” In arXiv:1606.06160, 2016. [15] Z. Cai, et al., “Deep learning with low precision by half-wave gaussian quantization.” In CVPR, 2017. [16] Shouyi Yin, et al., “A 141 uW, 2.46 pJ/Neuron Binarized Convolutional Neural Network based Self-learning Speech Recognition Processor in 28nm CMOS.”, In Symposia on VLSI Technology and Circuits, 2018. [17] Y.-H. Chen, et al., “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks.” In JSSC, ISSCC Special Issue, Vol. 52, No. 1, pp. 127-138, 2017. [18] K. Ueyoshi, et al., “QUEST: A 7.49 TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS.” In ISSCC, 2018. [19] S.-H. Sie, et al., “MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with Co-designed Compressed Neural Networks.” In arXiv:2010.12861, 2020. [20] W. Wen, et al., “Learning Structured Sparsity in Deep Neural Network”. In NIPS, 2016. [21] V. Sze, T.-J. Yang, Y.-H. Chen, J. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey." In Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, December 2017. [22] W. Wei et al., "A Relaxed Quantization Training Method for Hardware Limitations of Resistive Random Access Memory (ReRAM)-Based Computing-in-Memory." In IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, vol. 6, no. 1, pp. 45-52, June 2020. [23] A. Shafiee et al., "ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars." In International Symposium on Computer Architecture (ISCA), pp. 14-26, 2016. [24] P. Chi et al., "PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory." In International Symposium on Computer Architecture (ISCA), pp. 27-39, 2016. [25] Y. Zhe, et al., “Sticker: A 0.41-62.1 TOPS/W 8Bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers.” In VLSI, 2018. [26] H. Ji, L. Song, et al., "ReCom: An efficient resistive accelerator for compressed deep neural networks." In Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 237-240, 2018. [27] T. Yang et al., "Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks." In International Symposium on Computer Architecture (ISCA), pp. 236-249, 2019. [28] Xizi Chen et al., "CompRRAE: RRAM-based Convolutional Neural Network Accelerator with Reduced Computations through a Runtime Activation Estimation." In ASP-DAC, 2019. [29] R. Guo et al., "A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS." In Symposium on VLSI Circuits, pp. C120-C121, 2019. [30] J. Yue et al., "A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse," In International Solid- State Circuits Conference (ISSCC), pp. 234-236, 2020. [31] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger and A. Moshovos, "Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing," In International Symposium on Computer Architecture (ISCA), pp. 1-13, 2016. [32] Wei Wen et al., " Learning structured sparsity in deep neural networks." In NIPS, pp. 2082-2090, 2016. [33] X. Si et al., "A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips." In International Solid- State Circuits Conference (ISSCC), pp. 246-248, 2020. [34] X. Si et al., “A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning”, in ISSCC, 2019. [35] K. Prabhu et al., "CHIMERA: A 0.92-TOPS, 2.2-TOPS/W Edge AI Accelerator With 2-MByte On-Chip Foundry Resistive RAM for Efficient Training and Infer-ence," in IEEE Journal of Solid-State Circuits, vol. 57, no. 4, pp. 1013-1026, April 2022 [36] K. Goetschalckx and M. Verhelst, "DepFiN: A 12nm, 3.8TOPs depth-first CNN processor for high res. image processing," 2021 Symposium on VLSI Circuits, Kyoto, Japan, 2021, pp. 1-2. [37] M. Chang et al., "A 40nm 60.64TOPS/W ECC-Capable Com-pute-in-Memory/Digital 2.25MB/768KB RRAM/SRAM System with Embedded Cortex M3 Microprocessor for Edge Recommendation Systems," 2022 IEEE In-ternational Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2022, pp. 1-3. [38] J. Yue et al., "15.2 A 2.75-to-75.9TOPS/W Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating," 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2021, pp. 238-240 [39] H. Jia et al., "15.1 A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing," 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2021, pp. 236-238. [40] K. Goetschalckx and M. Verhelst, "Breaking High-Resolution CNN Bandwidth Barriers With Enhanced Depth-First Execution," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 323-331, June 2019. [41] S. Yan et al., "An FPGA-based MobileNet Accelerator Considering Network Structure Characteristics," 2021 31st International Conference on Field-Programmable Logic and Applications (FPL), Dresden, Germany, 2021, pp. 17-23. [42] T. -H. Yang et al., "Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks," 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA), Phoenix, AZ, USA, 2019, pp. 236-249. [43] Goetschalckx, K., & Verhelst, M. (2019). Breaking High-Resolution CNN Band-width Barriers With Enhanced Depth-First Execution. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9, 323-331.
|