|
[1] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012. [3] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015. [4] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440, 2015. [5] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520, 2018. [6] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016. [7] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning, pp. 6105–6114, PMLR, 2019. [8] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986, 2022. [9] S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie, “Convnext v2: Co-designing and scaling convnets with masked autoencoders,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16133–16142, 2023. [10] G. Hinton, O. Vinyals, and J. Dean, “Distilling the in a neural network,” arXiv preprint arXiv:1503.02531, 2015. [11] D. Blalock, J. J. Gonzalez Ortiz, J. Frankle, and J. Guttag, “What is the state of neural network pruning?,” Proceedings of machine learning and systems, vol. 2, pp. 129–146, 2020. [12] T. Hoefler, D. Alistarh, T. Ben-Nun, N. Dryden, and A. Peste, “Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks,” The Journal of Machine Learning Research, vol. 22, no. 1, pp. 10882–11005, 2021. [13] V. Natesh, A. Sabot, H. Kung, and M. Ting, “Rosko: Row skipping outer products for sparse matrix multiplication kernels,” arXiv preprint arXiv:2307.03930, 2023. [14] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integerarithmetic- only inference,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2704–2713, 2018. [15] M. Nagel, R. A. Amjad, M. Van Baalen, C. Louizos, and T. Blankevoort, “Up or down? adaptive rounding for post-training quantization,” in International Conference on Machine Learning, pp. 7197–7206, PMLR, 2020. [16] Y. Bhalgat, J. Lee, M. Nagel, T. Blankevoort, and N. Kwak, “Lsq+: Improving lowbit quantization through learnable offsets and better initialization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 696– 697, 2020. [17] M. Horowitz, “1.1 computing’s energy problem (and what we can do about it),” in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14, 2014. [18] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European conference on computer vision, pp. 525–542, Springer, 2016. [19] H. Chen, Y. Wang, C. Xu, B. Shi, C. Xu, Q. Tian, and C. Xu, “Addernet: Do we really need multiplications in deep learning?,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1468–1477, 2020. [20] Y. Xu, C. Xu, X. Chen, W. Zhang, C. Xu, and Y. Wang, “Kernel based progressive distillation for adder neural networks,” Advances in Neural Information Processing Systems, vol. 33, pp. 12322–12333, 2020. [21] D. A. Gudovskiy and L. Rigazio, “Shiftcnn: Generalized low-precision architecture for inference of convolutional neural networks,” arXiv preprint arXiv:1706.02393, 2017. [22] D. Blalock and J. Guttag, “Multiplying matrices without multiplying,” in International Conference on Machine Learning, pp. 992–1004, PMLR, 2021. [23] X. Tang, Y. Wang, T. Cao, L. L. Zhang, Q. Chen, D. Cai, Y. Liu, and M. Yang, “Lut-nn: Towards unified neural network inference by table lookup,” arXiv preprint arXiv:2302.03213, 2023. [24] H. Jegou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 1, pp. 117– 128, 2010. [25] J. MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp. 281–297, Oakland, CA, USA, 1967. [26] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020. [27] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. [28] D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016. [29] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016. [30] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, “Convolutional sequence to sequence learning,” in International conference on machine learning, pp. 1243–1252, PMLR, 2017. [31] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [32] H. Touvron, P. Bojanowski, M. Caron, M. Cord, A. El-Nouby, E. Grave, G. Izacard, A. Joulin, G. Synnaeve, J. Verbeek, et al., “Resmlp: Feedforward networks for image classification with data-efficient training,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 5314–5321, 2022. [33] J. Ran, R. Lin, J. C. L. Li, J. Zhou, and N. Wong, “Pecan: A product-quantized content addressable memory network,” in 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1–6, IEEE, 2023. [34] T. Chen, L. Li, and Y. Sun, “Differentiable product quantization for end-to-end embedding compression,” in International Conference on Machine Learning, pp. 1617–1626, PMLR, 2020. [35] R. Gong, X. Liu, S. Jiang, T. Li, P. Hu, J. Lin, F. Yu, and J. Yan, “Differentiable soft quantization: Bridging full-precision and low-bit neural networks,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 4852–4861, 2019. [36] A. Fan, P. Stock, B. Graham, E. Grave, R. Gribonval, H. Jegou, and A. Joulin, “Training with quantization noise for extreme model compression,” arXiv preprint arXiv:2004.07320, 2020. [37] V. Markovtsev, “Kmcuda.” https://github.com/src-d/kmcuda, 2020. [38] Y. Ding, Y. Zhao, X. Shen, M. Musuvathi, and T. Mytkowicz, “Yinyang k-means: A dropin replacement of the classic k-means with consistent speedup,” in International conference on machine learning, pp. 579–587, PMLR, 2015. [39] H. Touvron, M. Cord, and H. Jégou, “Deit iii: Revenge of the vit,” in European Conference on Computer Vision, pp. 516–533, Springer, 2022.
|