|
[1] K. Fukushima, ”Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological Cybernetics, vol. 36, no. 4, pp. 193-202, 1980 [2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ”ImageNet Classification with Deep Con- volutional Neural Networks,” NIPS, vol. 1, pp. 1097-1105, 2012 [3] Y. LeCun, Y. Bengio, and G. Hinton, ”Deep learning,” Nature, vol. 521, no. 7553, 2015 [4] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, ”Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998 [5] K. Simonyan and A. Zisserman, ”Very Deep Convolutional Networks for Large-Scale Image Recognition,” CoRR, vol. abs/1409.1556, 2014 [6] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, ”cuDNN: Efficient Primitives for Deep Learning,” CoRR, vol. abs/1410.0759, 2014 [7] M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H. P. Graf, ”A Massively Parallel Coprocessor for Convolutional Neural Networks,” IEEE ASAP, p.53-60, 2009 [8] V. Sriram, D. Cox, K. H. Tsoi, and W. Luk, ”Towards an embedded biologically-inspired machine vision processor,” FPT, 2010 [9] S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, ”A Dynamically Config- urable Coprocessor for Convolutional Neural Networks,” International Symposium on Com- puter Architecture (ISCA), vol. 38, no. 3, pp.247-257 , 2010 [10] M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, ”Memory-centric accelerator design for Convolutional Neural Networks,” IEEE ICCD, 2013 [11] V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, ”A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks,” IEEE CVPRW, 2014 [12] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, ”Deep Learning with Limited Numerical Precision,” CoRR, vol. abs/1502.02551, 2015 [13] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, ”Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks,” FPGA, pp. 161-170, 2015 [14] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, ”DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning,” ASPLOS, pp. 269-284, 2014 [15] Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, ”ShiDianNao: Shifting Vision Processing Closer to the Sensor,” International Symposium on Computer Architecture (ISCA), 2015 [16] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam, ”DaDianNao: A Machine-Learning Supercomputer,” MICRO, 2014 [17]S. Park, K. Bong, D. Shin, J. Lee, S. Choi, and H.-J. Yoo, ”A 1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data appli- cations,” IEEE International Solid-State Circuits Conference (ISSCC), 2015 [18] L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, ”Origami: A Convolutional Network Accelerator,” IEEE Transactions on Circuits and Systems for Video Technology, 2017 [19] Chen YH, Krishna T, Emer J, Sze V, ”Eyeriss: An energy-efficient reconfigurable ac- celerator for deep convolutional neural networks,” IEEE International Solid-State Circuits Conference (ISSCC), pp 262-263, 2016 [20] Song Han, Huizi Mao, and William J. Dally, ”Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” CoRR, vol. abs/1510.00149, 2015 [21] Song Han, Jeff Pool, John Tran, and William J. Dally, ”Learning Both Weights and Connections for Efficient Neural Networks,” Proceedings of the International Conference on Neural Information Processing Systems (NIPS), pp. 1135-1143, 2015 [22] Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey, ”Faster cnns with direct sparse convolutions and guided pruning,” Interna- tional Conference on Learning Representations (ICLR), 2017 [23] Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally, ”Eie: Efficient inference engine on compressed deep neural network,” In- ternational Symposium on Computer Architecture (ISCA), 2016 [24] Richard Dorrance, Fengbo Ren,Dejan Markovic, ”A Scalable Sparse Matrix-Vector Mul- tiplication Kernel for Energy-Efficient Sparse-Blas on FPGAs,” FPGA, pp.161-170, 2014 [25] Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E. and Moshovos, A., ”Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing,” Proceedings of the 43rd International Symposium on Computer Architecture, 2016 [26] B. Moons, Roel Uytterhoeven, Wim Dehaene, Marian Verhelst, ”ENVISION: A 0.26- to-10 TOPS/W Subword-Parallel Dynamic-Voltage-Accuracy-FrequencyScalable CNN Pro- cessor in 28nm FDSOI,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 246-247, 2017. |