|
[1] S.Han, H.Mao, and W. J.Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding,” 4th Int. Conf. Learn. Represent. ICLR 2016 - Conf. Track Proc., pp. 1–14, 2016. [2] J.Wu, C.Leng, Y.Wang, Q.Hu, and J.Cheng, “Quantized Convolutional Neural Networks for Mobile Devices,” Dec.2015, doi: 10.48550/arxiv.1512.06473. [3] C.Bucila, R.Caruana, and A.Niculescu-Mizil, “Model Compression Cristian,” Kdd, vol. 54, no. 1, pp. 1–9, 2006, [Online]. Available: http://users.iems.northwestern.edu/~nocedal/PDFfiles/dss.pdf%0Ahttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.7952&rep=rep1&type=pdf%255Cnhttp://books.nips.cc/papers/files/nips15/AA03.pdf%250Ahttps://ieeexplore.ieee.org/document/85783. [4] G.Hinton, O.Vinyals, and J.Dean, “Distilling the Knowledge in a Neural Network,” Mar.2015, doi: 10.48550/arxiv.1503.02531. [5] R.Chen, H.Ai, C.Shang, L.Chen, andZ.Zhuang, “Learning Lightweight Pedestrian Detector with Hierarchical Knowledge Distillation,” Sep.2019, doi: 10.1109/ICIP.2019.8803079. [6] A.Esteva et al., “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, pp. 115–118, 2017, doi: 10.1038/nature21056. [7] K.Czuszynski, J.Ruminski, and A.Kwasniewska, “Gesture recognition with the linear optical sensor and recurrent neural networks,” IEEE Sens. J., vol. 18, no. 13, pp. 5429–5438, 2018, doi: 10.1109/JSEN.2018.2834968. [8] N. D.Lane, P.Georgiev, and L.Qendro, “DeepEar: Robust smartphone audio sensing in unconstrained acoustic environments using deep learning,” UbiComp 2015 - Proc. 2015 ACM Int. Jt. Conf. Pervasive Ubiquitous Comput., pp. 283–294, 2015, doi: 10.1145/2750858.2804262. [9] A.Mathurz, N. D.Lanezy, S.Bhattacharyaz, A.Boranz, C.Forlivesiz, andF.Kawsarz, “DeepEye: Resource efficient local execution of multiple deep vision models using wearable commodity hardware,” MobiSys 2017 - Proc. 15th Annu. Int. Conf. Mob. Syst. Appl. Serv., pp. 68–81, 2017, doi: 10.1145/3081333.3081359. [10] A.Romero, N.Ballas, S. E.Kahou, A.Chassang, C.Gatta, andY.Bengio, “FitNets: Hints for thin deep nets,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–13, 2015. [11] S.Zagoruyko and N.Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer,” 5th Int. Conf. Learn. Represent. ICLR 2017 - Conf. Track Proc., pp. 1–13, 2017. [12] W.Park, D.Kim, Y.Lu, and M.Cho, “Relational knowledge distillation,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, pp. 3962–3971, 2019, doi: 10.1109/CVPR.2019.00409. [13] Y.Tian, D.Krishnan, and P.Isola, “Contrastive Representation Distillation,” no. 2014, pp. 1–19, 2019, [Online]. Available: http://arxiv.org/abs/1910.10699. [14] Y.Jang, H.Lee, S. J.Hwang, and J.Shin, “Learning what and where to transfer,” 36th Int. Conf. Mach. Learn. ICML 2019, vol. 2019-June, pp. 5360–5369, 2019. [15] K.Xu et al., “Show, attend and tell: Neural image caption generation with visual attention,” 32nd Int. Conf. Mach. Learn. ICML 2015, vol. 3, pp. 2048–2057, 2015. [16] Z.Niu, G.Zhong, and H.Yu, “A review on the attention mechanism of deep learning,” Neurocomputing, vol. 452, pp. 48–62, 2021, doi: 10.1016/j.neucom.2021.03.091. [17] R.Child, S.Gray, A.Radford, and I.Sutskever, “Generating Long Sequences with Sparse Transformers,” 2019, [Online]. Available: http://arxiv.org/abs/1904.10509. [18] M.Zaheer et al., “Big bird: Transformers for longer sequences,” Adv. Neural Inf. Process. Syst., vol. 2020-Decem, no. NeurIPS, 2020. [19] S.Wang, B. Z.Li, M.Khabsa, H.Fang, and H.Ma, “Linformer: Self-Attention with Linear Complexity,” vol. 2048, no. 2019, 2020, [Online]. Available: http://arxiv.org/abs/2006.04768. [20] M.Ji, B.Heo, andS.Park, “Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching,” 2021, [Online]. Available: http://arxiv.org/abs/2102.02973. [21] Y.Netzer, T.Wang, A.Coates, A.Bissacco, B.Wu, and A. Y.Ng, “PROFESSOR V.N. SHamov.,” Vopr. Neirokhir., vol. 16, no. 5, pp. 9–13, 1952. [22] M. B.McCrary, “Urban multicultural trauma patients.,” ASHA, vol. 34, no. 4, 1992. [23] L. N.Darlow, E. J.Crowley, A.Antoniou, and A. J.Storkey, “CINIC-10 is not ImageNet or CIFAR-10,” no. September, 2018, [Online]. Available: http://arxiv.org/abs/1810.03505. [24] J.Gou, B.Yu, S. J.Maybank, and D.Tao, “Knowledge Distillation: A Survey,” Int. J. Comput. Vis., vol. 129, no. 6, pp. 1789–1819, 2021, doi: 10.1007/s11263-021-01453-z. [25] Y.Bengio, A.Courville, and P.Vincent, “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, 2013, doi: 10.1109/TPAMI.2013.50. [26] N.Passalis and A.Tefas, “Learning Deep Representations with Probabilistic Knowledge Transfer,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11215 LNCS, pp. 283–299, 2018, doi: 10.1007/978-3-030-01252-6_17. [27] X.Jin et al., “Knowledge distillation via route constrained optimization,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-Octob, pp. 1345–1354, 2019, doi: 10.1109/ICCV.2019.00143. [28] J.Yim, D.Joo, J.Bae, and J.Kim, “A gift from knowledge distillation: Fast optimization, network minimization and transfer learning,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 7130–7138, 2017, doi: 10.1109/CVPR.2017.754. [29] B.Peng et al., “Correlation congruence for knowledge distillation,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-Octob, pp. 5006–5015, 2019, doi: 10.1109/ICCV.2019.00511. [30] F.Tung and G.Mori, “Similarity-preserving knowledge distillation,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-Octob, pp. 1365–1374, 2019, doi: 10.1109/ICCV.2019.00145. [31] D.Bahdanau, K. H.Cho, andY.Bengio, “Neural machine translation by jointly learning to align and translate,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–15, 2015. [32] M. T.Luong, H.Pham, and C. D.Manning, “Effective approaches to attention-based neural machine translation,” Conf. Proc. - EMNLP 2015 Conf. Empir. Methods Nat. Lang. Process., pp. 1412–1421, 2015, doi: 10.18653/v1/d15-1166. [33] D.Britz, A.Goldie, M. T.Luong, and Q.V.Le, “Massive exploration of neural machine translation architectures,” EMNLP 2017 - Conf. Empir. Methods Nat. Lang. Process. Proc., pp. 1442–1451, 2017, doi: 10.18653/v1/d17-1151. [34] A.Vaswani et al., “Attention Is All You Need,” Jun.2017, doi: 10.48550/arxiv.1706.03762. [35] J.Gehring, M.Auli, D.Grangier, D.Yarats, andY. N.Dauphin, “Convolutional sequence to sequence learning,” 34th Int. Conf. Mach. Learn. ICML 2017, vol. 3, pp. 2029–2042, 2017. [36] M. D.Zeiler and R.Fergus, “Stochastic pooling for regularization of deep convolutional neural networks,” 1st Int. Conf. Learn. Represent. ICLR 2013 - Conf. Track Proc., pp. 1–9, 2013. [37] D.Miao, W.Pedrycz, D.Ślezak, G.Peters, Q.Hu, and R.Wang, “Rough Sets and Knowledge Technology: 9th International Conference, RSKT 2014 Shanghai, China, October 24–26, 2014 Proceedings,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 8818, no. October 2014, 2014, doi: 10.1007/978-3-319-11740-9. [38] S.Zhai et al., “S3Pool: Pooling with stochastic spatial sampling,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 4003–4011, 2017, doi: 10.1109/CVPR.2017.426. [39] A.Stergiou, R.Poppe, and G.Kalliatakis, “Refining activation downsampling with SoftPool,” pp. 10337–10346, 2022, doi: 10.1109/iccv48922.2021.01019. [40] Y.Netzer, T.Wang, A.Coates, A.Bissacco, B.Wu, and A. Y.Ng, “Reading digits in natural images with unsupervised feature learning,” 2011. [41] A.Krizhevsky, G.Hinton, andothers, “Learning multiple layers of features from tiny images,” 2009. [42] K.He, X.Zhang, S.Ren, and J.Sun, “Deep residual learning for image recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 770–778, 2016, doi: 10.1109/CVPR.2016.90. [43] N.Komodakis, “Wide Residual Networks,” 2016.
|