|
[1] Song Han, Huizi Mao, and William J Dally. “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding”. arXiv preprint arXiv:1510.00149 (2015). [2] Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. “Towards the limit of network quantization”. arXiv preprint arXiv:1612.01543 (2016). [3] Yuhui Xu et al. “Deep neural network compression with single and multiple level quantization”. Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. 1. 2018. [4] L. Deng et al. “Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey”. Proceedings of the IEEE 108.4 (2020), pp. 485–532. doi: 10.1109/JPROC.2020.2976475. [5] Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. “Thinet: A filter level pruning method for deep neural network compression”. Proceedings of the IEEE international conference on computer vision. 2017, pp. 5058–5066. [6] Yihui He, Xiangyu Zhang, and Jian Sun. “Channel pruning for accelerating very deep neural networks”. Proceedings of the IEEE International Conference on Computer Vision. 2017, pp. 1389–1397. [7] Zhuangwei Zhuang et al. “Discrimination-aware channel pruning for deep neural networks”. Advances in Neural Information Processing Systems. 2018, pp. 875– 886. [8] Vadim Lebedev et al. “Speeding-up convolutional neural networks using finetuned cp-decomposition”. arXiv preprint arXiv:1412.6553 (2014). [9] Yong-Deok Kim et al. “Compression of deep convolutional neural networks for fast and low power mobile applications”. arXiv preprint arXiv:1511.06530 (2015). [10] Cheng Tai et al. “Convolutional neural networks with low-rank regularization”. arXiv preprint arXiv:1511.06067 (2015). [11] Julia Gusak et al. “Automated multi-stage compression of neural networks”. Proceedings of the IEEE International Conference on Computer Vision Workshops. 2019, pp. 0–0. [12] Yu Cheng et al. “A survey of model compression and acceleration for deep neural networks”. arXiv preprint arXiv:1710.09282 (2017). [13] Tamara G Kolda and Brett W Bader. “Tensor decompositions and applications”. SIAM review 51.3 (2009), pp. 455–500. 32 [14] Zhuang Liu et al. “Learning efficient convolutional networks through network slimming”. Proceedings of the IEEE International Conference on Computer Vision. 2017, pp. 2736–2744. [15] Xuanyi Dong et al. “More is less: A more complicated network with less inference complexity”. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, pp. 5840–5848. [16] Weizhe Hua et al. “Channel gating neural networks”. Advances in Neural Information Processing Systems. 2019, pp. 1886–1896. [17] Xitong Gao et al. “Dynamic channel pruning: Feature boosting and suppression”. arXiv preprint arXiv:1810.05331 (2018). [18] Marcella Astrid and Seung-Ik Lee. “Cp-decomposition with tensor power method for convolutional neural networks compression”. 2017 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE. 2017, pp. 115– 118. [19] Misha Denil et al. “Predicting parameters in deep learning”. arXiv preprint arXiv:1306.0543 (2013). [20] Emily Denton et al. “Exploiting linear structure within convolutional networks for efficient evaluation”. arXiv preprint arXiv:1404.0736 (2014). [21] Yunchao Gong et al. “Compressing deep convolutional networks using vector quantization”. arXiv preprint arXiv:1412.6115 (2014). [22] Wenlin Chen et al. “Compressing neural networks with the hashing trick”. International conference on machine learning. PMLR. 2015, pp. 2285–2294. [23] Yu Cheng et al. “Fast neural networks with circulant projections”. arXiv preprint arXiv:1502.03436 2 (2015). [24] Alexander Novikov et al. “Tensorizing neural networks”. arXiv preprint arXiv:1509.06569 (2015). [25] Johan Håstad. “Tensor rank is NP-Complete”. International Colloquium on Automata, Languages, and Programming. Springer. 1989, pp. 451–460. [26] Shinichi Nakajima et al. “Perfect dimensionality recovery by variational Bayesian PCA”. Advances in Neural Information Processing Systems 25 (2012), pp. 971– 979. [27] J Douglas Carroll and Jih-Jie Chang. “Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition”. Psychometrika 35.3 (1970), pp. 283–319. [28] Richard A Harshman and Margaret E Lundy. “PARAFAC: Parallel factor analysis”. Computational Statistics & Data Analysis 18.1 (1994), pp. 39–72. [29] Amnon Shashua and Tamir Hazan. “Non-negative tensor factorization with applications to statistics and computer vision”. Proceedings of the 22nd international conference on Machine learning. 2005, pp. 792–799. [30] Ledyard R Tucker. “Some mathematical notes on three-mode factor analysis”. Psychometrika 31.3 (1966), pp. 279–311. 33 [31] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. “A multilinear singular value decomposition”. SIAM journal on Matrix Analysis and Applications 21.4 (2000), pp. 1253–1278. [32] Yong-Deok Kim and Seungjin Choi. “Nonnegative tucker decomposition”. 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. 2007, pp. 1–8. [33] Pieter M Kroonenberg and Jan De Leeuw. “Principal component analysis of three-mode data by means of alternating least squares algorithms”. Psychometrika 45.1 (1980), pp. 69–97. [34] Arie Kapteyn, Heinz Neudecker, and Tom Wansbeek. “An approach ton-mode components analysis”. Psychometrika 51.2 (1986), pp. 269–275. [35] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. “On the best rank-1 and rank-(r 1, r 2,..., rn) approximation of higher-order tensors”. SIAM journal on Matrix Analysis and Applications 21.4 (2000), pp. 1324–1342. [36] Lars Eldén and Berkant Savas. “A Newton–Grassmann Method for Computing the Best Multilinear Rank-(r_1, r_2, r_3) Approximation of a Tensor”. SIAM Journal on Matrix Analysis and applications 31.2 (2009), pp. 248–271. [37] Jianbo Ye et al. “Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers”. arXiv preprint arXiv:1802.00124 (2018). [38] Gao Huang et al. “Densely connected convolutional networks”. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pp. 4700– 4708. [39] Karen Simonyan and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition”. arXiv preprint arXiv:1409.1556 (2014). [40] Richard A Harshman et al. “Foundations of the PARAFAC procedure: Models and conditions for an" explanatory" multimodal factor analysis” (1970). [41] Kaiming He et al. “Deep residual learning for image recognition”. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 770– 778. [42] Lei Deng et al. “Model compression and hardware acceleration for neural networks: A comprehensive survey”. Proceedings of the IEEE 108.4 (2020), pp. 485– 532. [43] Vin De Silva and Lek-Heng Lim. “Tensor rank and the ill-posedness of the best low-rank approximation problem”. SIAM Journal on Matrix Analysis and Applications 30.3 (2008), pp. 1084–1127. [44] Ashish Vaswani et al. “Attention is All you Need”. Advances in Neural Information Processing Systems. Ed. by I. Guyon et al. Vol. 30. Curran Associates, Inc., 2017. url: https : / / proceedings . neurips . cc / paper / 2017 / file / 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf. [45] Tom Brown et al. “Language Models are Few-Shot Learners”. Advances in Neural Information Processing Systems. Ed. by H. Larochelle et al. Vol. 33. Curran Associates, Inc., 2020, pp. 1877–1901. url: https://proceedings.neurips. cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
|