|
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” in Nature, vol. 521, no. 7553, pp. 436-444, May 2015. [2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convoltional neural networks,” in Proc. Advances in Neural Information Processing Systems 25 (NIPS 2012), pp. 1106–1114, Dec. 2012. [3] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “EIE: efficient inference engine on compressed deep neural network,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 243-254, June 2016. [4] H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh, “From high-level deep neural models to FPGAs,” in Proc. 49th Ann. IEEE/ACM Int. Symp. on Microarchitecture (MICRO), no. 17, pp. 1-12, Oct. 2016. [5] Y. H. Chen, J. Emer, and V. Sze, “Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks,” in IEEE Press ACM SIGARCH Computer Architecture News, pp. 367-379, June 2016. [6] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, “In-Datacenter Performance Analysis of a Tensor Processing Unit,” in IEEE Press 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 1-12, June 2017. [7] H. T. Kung, B. McDanel, and S. Q. Zhang, “Mapping Systolic Arrays onto 3D Circuit Structures: Accelerating Convolutional Neural Network Inference,” in 2018 IEEE International Workshop on Signal Processing Systems (SiPS), pp. 330-336, Oct. 2018. [8] S. K. Lee, P. N. Whatmough, N. Mulholland, P. Hansen, D. Brooks, and G.-Y. Wei, “A Wide Dynamic Range Sparse FC-DNN Processor with Multi-Cycle Banked SRAM Read and Adaptive Clocking in 16nm FinFET,” in ESSCIRC 2018-IEEE 44th European Solid State Circuits Conference (ESSCIRC), pp. 158–161, Sept. 2018. [9] P. N. Whatmough, S. K. Lee, H. Lee, S. Rama, D. Brooks, and G.-Y. Wei, “A 28nm SoC with a 1.2 GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications,” in 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 242–243, Feb. 2017. [10] Arm Limited, “Arm Ethos-N series processors,” https://developer.arm.com/ip-products/processors/machine-learning/arm-ethos-n, arm Developer, Jan. 2019. [11] Google LLC, “Helping you bring local AI to applications from prototype to production,” https://coral.ai/products/#production-products/, Coral Products, 2019. [12] NVIDIA Corporation, “Buy the Latest Jetson Products,” https://developer.nvidia.com/buy-jetson, Nvidia autonomous machines, Nov. 2015. [13] Intel Corporation, “Intel® Neural Compute Stick 2 (Intel® NCS2),” https://software.intel.com/en-us/neural-compute-stick, Intel® Software Developer Zone, Aug. 2018. [14] C.-J. Wu, D. Brooks, K. Chen, D. Chen, S. Choudhury, M. Dukhan, K. Hazelwood, E. Isaac, Y. Jia, B. Jia, T. Leyvand, H. Lu, Y. Lu, L. Qiao, B. Reagen, J. Spisak, F. Sun, A. Tulloch, P. Vajda, X. Wang, Y. Wang, B. Wasti, Y. Wu, R. Xian, S. Yoo, P. Zhang, “Machine learning at facebook: Understanding inference at the edge,” in: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 331-344, Feb. 2019. [15] C. Merkel, R. Hasan, N. Soures, D. Kudithipudi, T. Taha, S. Agarwal, and M. Marinella, “Neuromemristive Systems: Boosting Efficiency through Brain-Inspired Computing,” in IEEE Computer, vol. 49, no. 10, pp. 56-64, Oct. 2016. [16] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in Reram-Based Main Memory,” in Proc. 43rd Int. Symp. on Computer Architecture (ISCA), pp. 27-39, Jun. 2016. [17] H. Y. Lee, P. S. Chen, T. Y. Wu, Y. S. Chen, C. C. Wang, P. J. Tzeng, C. H. Lin, F. Chen, C. H. Lien, and M.-J. Tsai, “Low Power and High Speed Bipolar Switching With a Thin Reactive Ti Buffer Layer in Robust HfO2 Based RRAM,” in 2008 IEEE International Electron Devices Meeting, pp. 1-4, Dec. 2008. [18] J.-G. Zhu, Y. Zheng, and G. A. Prinz, “Ultrahigh density vertical magnetoresistive random access memory,” in Journal of Applied Physics, vol. 87, no. 9, pp. 6668-6673, April 2000. [19] Y. Choi, I. Song, M.-H. Park, H. Chung, S. Chang, B. Cho, J. Kim, Y. Oh, D. Kwon, J. Sunwoo, J. Shin, Y. Rho, C. Lee, M. G. Kang, J. Lee, Y. Kwon, S. Kim, J. Kim, Y.-J. Lee, Q. Wang, S. Cha, S. Ahn, H. Horii, J. Lee, K. Kim, H. Joo, K. Lee, Y.-T. Lee, J. Yoo, and G. Jeong, “A 20nm 1.8 V 8Gb PRAM with 40MB/s Program Bandwidth,” in 2012 IEEE International Solid-State Circuits Conference, pp. 46-48, Feb. 2012. [20] D. Walczyk, T. Bertaud, M. Sowinska, M. Lukosius, et al., “Resistive switching behavior in tin/hfo2/ti/tin devices,” in Proc. Int. Semiconductor Conference Dresden-Grenoble (ISCDG), pp. 143–146, Sept. 2012. [21] P. Pouyan, E. Amat, and A. Rubio, “Memristive Crossbar Memory Lifetime Evaluation and Reconfiguration Strategies,” in IEEE Trans. on Emerging Topics in Computing, pp. 1, June 2016. [22] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep Learning with Limited Numerical Precision,” in International Conference on Machine Learning, pp. 1737-1746, July 2015. [23] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704-2713, June 2018. [24] C. Zhu, S. Han, H. Mao, and W. J. Dally, “Trained Ternary Quantization,” in arXiv preprint 1612.01064, Feb. 2017. [25] H. T. Kuang, “Why Systolic Architectures?” in IEEE computer, vol. 15, no. 1, pp. 37-46, Feb. 1982. [26] G. J. Li, “The Design of Optimal Systolic Arrays,” in IEEE Transactions on Computers, vol. 100, no. 1, pp. 66-77, Jan. 1985. [27] I. Z. Milentijević, I. Z. Milovanović, E. I. Milovanović, and M. K. Stojcev, “The Design of Optimal Planar Systolic Arrays for Matrix Multiplication,” in Computers & Mathematics with Applications, vol. 33, no. 6, pp. 17-35, March 1997. [28] R. Günter, “A Systolic Array Algorithm for the Algebraic Path Problem (Shortest Paths; Matrix Inversion),” in Computing, vol. 34, no. 3, pp. 191-219, Sep. 1985. [29] M. R. Zargham, “Data Flow and Systolic Array Architectures,” in Computer Architecture: Single and Parallel Systems, 1st ed., ch. 8, Dec. 1996. [30] J. J. Lee, and G. Y. Song, “Super-Systolic Array for 2D Convolution,” in TENCON 2006-2006 IEEE Region 10 Conference, pp. 1-4, Nov. 2006. [31] H. T. Kung, B. McDanel, and S. Q. Zhang, “Adaptive Tiling: Applying Fixed-Size Systolic Arrays to Sparse Convolutional Neural Networks,” in IEEE Press 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1006-1011, Aug. 2018. [32] M. G. Smith, S. Emanuel, “Methods of Making Thru-Connections in Semiconductor Wafers,” U.S. Patent, no. 3,343,256, Sept. 1967. [33] D. A. Meltzer, P. Kulkarni, A. J. Walder, J. Farkas, “Clear Hydrophobic TPU,” U.S. Patent Application, no. 14/765,657, Dec. 2015 [34] J. Y. Hu, K. W. Hou, C. Y. Lo, Y. F. Chou, and C. W. Wu, “RRAM-Based Neuromorphic Hardware Reliability Improvement by Self-Healing and Error Correction,” in 2018 IEEE International Test Conference in Asia (ITC-Asia), pp. 19-24, Aug. 2018. [35] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, and W. Lu, “Nanoscale memristor device as synapse in neuromorphic systems,” in Nano letters, vol. 10, no. 4, pp. 1297-1301, March 2010. [36] J. Zhou, K.-H. Kim, and W. Lu, “Crossbar RRAM Arrays: Selector Device Requirements During Read Operation,” in IEEE Transactions on Electron Devices, vol. 61, no. 5, pp. 1369-1376, March 2014. [37] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, V. Srikumar, “ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” in Proc. 43rd Int. Symp. on Computer Architecture (ISCA), pp. 14-26, Jun. 2016. [38] P. Y. Chen, X. Peng, and S. Yu, “Neurosim: A Circuit-Level Macro Model for Benchmarking Neuro-Inspired Architectures in Online Learning,” in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 12, pp. 3067-3080, Jan. 2018. [39] M. Cheng, L. Xia, Z. Zhu, Y. Cai, Y. Xie, Y. Wang, and H. Yang, “TIME: A Training-in-Memory Architecture for RRAM-Based Deep Neural Networks,” in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 5, pp. 834-847, April 2018. [40] S. Ambrogio, P. Narayanan, H. Tsai, R. M. Shelby, I. Boybat, C. di Nolfo, S. Sidler, M. Giordano, M. Bodini, N. C. P. Farinha, B. Killeen, C. Cheng, Y. Jaoudi, and G. W. Burr, “Equivalent-Accuracy Accelerated Neural-Network Training Using Analogue Memory,” in Nature, vol. 558, no. 7708, pp. 60-67, June 2018. [41] M. Mao, X. Peng, R. Liu, J. Li, S. Yu, and C. Chakrabarti, “MAX2: An ReRAM-based Neural Network Accelerator That Maximizes Data Reuse and Area Utilization,” in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, April 2019. [42] Q. Yang, H. Li, Q. Wu, “A Quantized Training Method to Enhance Accuracy of ReRAM-Based Neuromorphic Systems,” in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1-5, May 2018. [43] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, “Striving for Simplicity: The All Convolutional Net,” arXiv preprint 1412.6806, April 2014. [44] G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708, July 2017. [45] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv preprint 1704.04861, April 2017.
|