|
[1] T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy, “Tvm: An automated end-to-end optimizing compiler for deep learning,” 2018. [2] Specification of RISC-V P extension, RISC-V, accessed: 2021-05-11. [Online]. Available: https://github.com/riscv/riscv-p-spec [3] TFLite Hosted Model, accessed: 2021-05-11. [Online]. Available: https://www.tensorflow.org/lite/guide/hosted models [4] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” arXiv preprint arXiv:1912.01703, 2019. [5] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang, “Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems,” arXiv preprint arXiv:1512.01274, 2015. [6] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 675–678. [7] ONNX, accessed: 2021-05-11. [Online]. Available: https://onnx.ai/ [8] A. Jain, S. Bhattacharya, M. Masuda, V. Sharma, and Y. Wang, “Efficient execution of quantized deep learning models: A compiler approach,” arXiv preprint arXiv:2006.10226, 2020. [9] J. Roesch et al., “Relay: A new ir for machine learning frameworks,” in Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2018, pp. 58–68. [10] J. Cong and B. Xiao, “Minimizing computation in convolutional neural networks,” in International conference on artificial neural networks. Springer, 2014, pp. 281–290. [11] ARM NEON Instructions, ARM, accessed: 2021-05-11. [Online]. Available: https://developer.arm.com/architectures/instruction-sets/simdisas/neon [12] Intel VNNI Instructions, Intel, accessed: 2021-05-11. [Online]. Available: https://en.wikichip.org/wiki/x86/avx512 vnni [13] Specification of RISC-V P extension, RISC-V, accessed: 2021-05-11. [Online]. Available: https://github.com/riscv/riscv-v-spec [14] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017. [15] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520. [16] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826. [17] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017. [18] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016. [19] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [21] RISC-V ISA Manual, RISC-V, document version: 2019-12-13. [Online]. Available: https://riscv.org/technical/specifications/ [22] Post-training Quantization, TensorFlow, accessed: 2021-05-11. [Online]. Available: https://www.tensorflow.org/lite/performance/post training quantization [23] Quantization-aware Training, TensorFlow, accessed: 2021-05-11. [Online]. Available: https://www.tensorflow.org/model optimization /guide/quantization/training [24] Quantization Docs, Pytorch, accessed: 2021-05-11. [Online]. Available: https://pytorch.org/docs/stable/quantization.html [25] C.-L. Lee, M.-Y. Hsu, B.-S. Lu, M.-Y. Hung, and J.-K. Lee, “Experiment and enabled flow for gpgpu-sim simulators with fixed-point instructions,” Journal of Systems Architecture, vol. 111, p. 101783, 2020. [26] J. Deng et al., “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 248–255 |