|
[1] S.-H. Sie, et al., “MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with Co-designed Compressed Neural Networks.” In arXiv:2010.12861, 2020. [2] Olga Russakovsky, et al., “Imagenet large scale visual recognition challenge.” In International Journal of Computer Vision, 115.3: 211-252, 2015. [3] A. Krizhevsky, et al., “Imagenet classification with deep convolutional neural networks.” In NIPS, 2012. [4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition.” In ICLR, 2015. [5] Howard, Andrew G., et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications." In arXiv:1704.04861, 2017. [6] E. Lindholm, et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," In IEEE Micro, vol. 28, no. 2, pp. 39-55, 2008. [7] N. P. Jouppi et al., "In-datacenter performance analysis of a tensor processing unit," In ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 1-12, 2017. [8] S. Han, et al., “Learning both Weights and Connections for Efficient Neural Networks.” In NIPS, 2015. [9] Zhang, Xiangyu, et al. "Shufflenet: An extremely efficient convolutional neural network for mobile devices." In IEEE Conference on computer vision and pattern recognition, 2018. [10] C. Szegedy et al., "Going deeper with convolutions." In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, doi: 10.1109/CVPR.2015. [11] R. Andri, et al., "YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights," In IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2016. [12] S. Yin et al., "An Ultra-High Energy-Efficient Reconfigurable Processor for Deep Neural Networks with Binary/Ternary Weights in 28NM CMOS," In IEEE Symposium on VLSI Circuits, 2018. [13] Y.-H. Chen, et al., “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks.” In JSSC, ISSCC Special Issue, Vol. 52, No. 1, pp. 127-138, 2017. [14] Kung, Hsiang-Tsung. "Why systolic architectures?." Computer 15.01 (1982): 37-46. [15] Y. Chen et al., "DaDianNao: A Machine-Learning Supercomputer," In IEEE Micro, pp. 609-622, 2014. [16] Y. Zhe, et al., “Sticker: A 0.41-62.1 TOPS/W 8Bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers.” In VLSI, 2018. [17] H. Ji, L. Song, et al., "ReCom: An efficient resistive accelerator for compressed deep neural networks." In Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 237-240, 2018. [18] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger and A. Moshovos, "Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing," In International Symposium on Computer Architecture (ISCA), pp. 1-13, 2016. [19] J. -F. Zhang, C. -E. Lee, C. Liu, Y. S. Shao, S. W. Keckler and Z. Zhang, "SNAP: A 1.67 — 21.55TOPS/W Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference in 16nm CMOS," In Symposium on VLSI Circuits, 2019. [20] H. Li, et al., ” Pruning Filters for Efficient ConvNets.” In ICLR, 2017. [21] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan and C. Zhang, "Learning Efficient Convolutional Networks through Network Slimming," In ICCV, 2017. [22] R. Guo et al., "A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS." In Symposium on VLSI Circuits, pp. C120-C121, 2019. [23] J. Yue et al., "A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse," In International Solid- State Circuits Conference (ISSCC), pp. 234-236, 2020. [24] H. Jia et al., "A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing," In International Solid- State Circuits Conference (ISSCC), pp. 236-238, 2021. [25] J. Yue et al., "A 2.75-to-75.9TOPS/W Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating," In International Solid- State Circuits Conference (ISSCC), pp. 238-240, 2021 [26] X. Si et al., "A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips." In International Solid- State Circuits Conference (ISSCC), pp. 246-248, 2020. [27] J. -H. Kim, J. Lee, J. Lee, H. -J. Yoo and J. -Y. Kim, "Z-PIM: An Energy-Efficient Sparsity Aware Processing-In-Memory Architecture with Fully-Variable Weight Precision," In IEEE Symposium on VLSI Circuits, 2020. [28] Z. Li et al., "A Miniature Electronic Nose for Breath Analysis," 2021 IEEE International Electron Devices Meeting (IEDM), 2021. [29] X. Si et al., "24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning," 2019 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 396-398, 2019. [30] A. Biswas et al., “Conv-RAM: An Energy-Efficient SRAM with Embedded Convolution Computation for Low-Power CNN-Based Machine Learning Applications,” ISSCC, pp. 488-489, Feb. 2018. [31] V. Khwa et al., “A 65nm 4Kb Algorithm-Dependent Computing-In-Memory SRAM Unit-Macro with 2.3ns and 55.8 TOPS/W Fully Parallel Product-Sum Operation for Binary DNN Edge Processors,” ISSCC, pp. 496-498, Feb. 2018. [32] S. K. Gonugondla et al., “A 42pJ/decision 3.12 TOPS/W Robust In-Memory Machine Learning Classifier with On-Chip Training,” ISSCC, pp. 490-492, Feb. 2018.
|