|
[1] Jason Cong and Bingjun Xiao. “Minimizing computation in convolutional neural networks.” in Proc. Int. Conf. Artif. Neural Netw. (ICANN), pp. 281-290, 2014. [2] Wonkyung Jung, Daejin Jung, Byeongho Kim, Sunjung Lee, Wonjong Rhee, and Jung Ho Ahn. “Restructuring Batch Normalization to Accelerate CNN Training.”, CoRR, vol. abs/1807.01702, 2018. [3] Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks.”, Proc. IEEE Int’I Solid-States Circuits Conf. (ISSCC 16), pp.262-263, 2016 [4] Song Han, Huizi Mao, and William J. Dally. “Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding.”, in Int. Conf. Learning Representations (ICLR), 2016. [5] Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. “Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or -1.”, CoRR, vol. abs/1511.00363, 2015. [6] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ail Farhadi. “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.”, CoRR, vol. abs/1603.05279, 2016. [7] Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi. “Hardware-oriented Approximation of Convolutional Neural Networks.”, CoRR, vol. abs/1605.06402, 2016. [8] Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. “Quantized Convolutional Neural Networks for Mobile Devices.”, in CVPR. IEEE Computer Society, pp.4820-4828, 2016 [9] Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. “Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights.”, in Proc. ICLR, 2017. [10] Song Han, Jeff Pool, John Tran, and William J. Dally. “Learning both Weights and Connections for Efficient Neural Networks.”, in Advances in Neural Information Processing Systems, 2015. [11] Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. “Pruning Filters for Efficient ConvNets.”, in ICLR, pages 1-13, 2017. [12] Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. “Pruning Convolutional Neural Networks for Resource Efficient Inference.”, in ICLR, pages 1-17, 2017. [13] Yihui He, Xiangyu Zhang, and Jian Sun. “Channel Pruning for Accelerating Very Deep Neural Networks.”, in International Conference on Computer Vision (ICCV), vol. 2, p. 6, 2017. [14] Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. “ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression.”, in The IEEE International Conference on Computer Vision (ICCV), 2017. [15] Kumar Chellapilla, Sidd Puri, and Patrice Simard. “High Performance Convolutional Neural Networks for Document Processing.”, in Tenth International Workshop on Frontiers in Handwriting Recognition, October 2006. [16] Chetlur, Sharan, Woolley, Cliff, Vandermersch, Philippe, Cohen, Jonathan, Tran, John, Catanzaro, Bryan, and Shelhamer, Evan. “cuDNN : Efficient Primitives for Deep Learning.”, CoRR, abs/1410.0759, 2014. [17] Minsik Cho and Daniel Brand. “MEC: Memory-efficient Convolution for Deep Neural Network.”, in Proc. Int. Conf. Mach. Learn. (ICML), Sydney, NSW, Australia, pp. 815-824, 2017. [18] Aravind Vasudevan, Andrew Anderson, and David Gregg. “Parallel Multi Channel Convolution using General Matrix Multiplication.”, in 28th IEEE International Conference on Application-specific Systems, Architectures and Processors, ASAP 2017. [19] Andrew Anderson, Aravind Vasudevan, Cormac Keane, David Gregg. “Low-memory GEMM-based Convolution Algorithms for Deep Neural Networks.”, arXiv preprint arXiv:1709.03395, 2017. [20] Alex Krizhevsky, Ilya Stuskever, and Geoffrey E. Hinton. “ImageNet Classification with Deep Convolutional Neural Networks.”, in NIPS, 2012. [21] Karen Simonyan and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition.”, 2015. [22] Joseph Redmon and Ali Farhadi. “Yolo9000: Better, faster, stronger.”, in CVPR, 2017. [23] Jiyuan Zhang, Franz Franchetti, Tze Meng Low. “High Performance Zero-Memory Overhead Direct Convolutions.”, in ICML, 2018. [24] Xuan Yang, Jing Pu, Blaine Burton Rister, Nikhil Bhagdikar, Stephen Richardson, Shahar Kvatinsky, Jonathan Ragan-Kelley, Ardavan Pedram and Mark Horowitz. “A Systematic Approach to Blocking Convolutional Neural Networks.”, arXiv, 2016. [25] Yufei Ma, Yu Cao, Sarma Vrudhula, Jae-sun Seo. “Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks.”, in ACM, pp. 45-54, 2017. [26] Fengbin Tu, Shouyi Yin, Peng Ouyang, Shinbin Tang, Leibo Liu, and Shaojun Wei. “Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns.”, IEEE Journal of Solid-State Circuits, vol.52, pp. 127-138, 2017. [27] Arthur Stoutchinin, Francesco Conti, Luca Benini. “Optimally Scheduling CNN Convolutions for Efficient Memory Access.”, CoRR, vol. abs/1902.01492, 2019. [28] Marian Verhelst and Bert Moons. “Embedded Deep Neural Network Processing: Algorithmic and Processor Techniques Bring Deep Learning to IoT and Edge Devices.”, IEEE Solid-State Circuits Magazine, 9(4):55-65, 2017. [29] Mattson Richard L., et al. “Evaluation techniques for storage hierarchies.”, IBM Systems journal 9.2 (1970): 78-117. [30] Cheng-Lin Tsai, et al. “A Fast-and-Effective Early-Stage Multi-level Cache Optimization Method based on Reuse-Distance Analysis.” National Tsing Hua University, 2016. [31] http://www.cs.wisc.edu/~markhill/DineroIV |