|
[1] “Ai vs machine learning vs deep learning.” https://www.edureka.co/blog/ ai-vs-machine-learning-vs-deep-learning/. Accessed: 2022-08-01. [2] “How do convolutional neural networks work?.” https://brohrer.mcknote.com/ zh-Hant/how_machine_learning_works/how_convolutional_neural_networks_ work.html. Accessed: 2021-08-01. [3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097–1105, 2012. [4] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016. [5] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [6] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017. [7] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009. [8] A. Krizhevsky and G. Hinton, “Convolutional deep belief networks on cifar-10,” Unpublished manuscript, vol. 40, no. 7, pp. 1–9, 2010. [9] C.-C. Yang, Y.-R. Chen, H.-H. Liao, Y.-M. Chang, and J.-K. Lee, “Auto-tuning fixedpoint precision with tvm on risc-v packed simd extension,” ACM Transactions on Design Automation of Electronic Systems, 2022. [10] “Risc-v.” https://riscv.org/. Accessed: 2021-08-01. [11] “Risc-v packed simd extension.” https://github.com/riscv/riscv-p-spe. Accessed: 2021-08-01. [12] T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze, et al., “Tvm: An automated end-to-end optimizing compiler for deep learning,” in 13th Symposium on Operating Systems Design and Implementation (18), pp. 578–594, 2018. [13] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, pp. 8026– 8037, 2019. [14] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al., “Tensorflow: A system for large-scale machine learning,” in 12th symposium on operating systems design and implementation (16), pp. 265–283, 2016. [15] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang, “Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems,” arXiv preprint arXiv:1512.01274, 2015. [16] C. Lattner and V. Adve, “Llvm: A compilation framework for lifelong program analysis & transformation,” in International Symposium on Code Generation and Optimization, 2004. CGO 2004., pp. 75–86, IEEE, 2004. [17] “High-performance hardware for machine learning.” https://media.nips.cc/ Conferences/2015/tutorialslides/Dally-NIPS-Tutorial-2015.pdf. Accessed: 2021-08- 01. [18] “opencl-2.0.” https://registry.khronos.org/OpenCL/specs/opencl-2.0.pdf. Accessed: 2021-08-01. [19] C.-C. Yang, S.-C. Wang, M.-Y. Hsu, Y.-M. Chang, Y.-S. Hwang, and J.-K. Lee, “Support opencl 2.0 compiler on llvm for ptx simulators,” Journal of Signal Processing Systems, vol. 91, no. 3, pp. 261–271, 2019. [20] “Nvptx.” https://llvm.org/docs/NVPTXUsage.html. Accessed: 2021-08-01. [21] “Cuda.” https://developer.nvidia.com/cuda-zone. Accessed: 2021-08-01. [22] “Spike, a risc-v isa simulator.” https://github.com/riscv-software-src/riscv-isa-sim. Accessed: 2021-08-01. [23] “The microsoft cognitive toolkit (cntk).” https://github.com/microsoft/CNTK. Accessed: 2021-08-01. [24] “Core ml.” https://developer.apple.com/documentation/coreml. Accessed: 2021-08- 01. [25] “Keras.” https://keras.io/. Accessed: 2021-08-01. [26] “Onnx.” https://onnx.ai/. Accessed: 2021-08-01. [27] “The LLVM Compiler Infrastructure.” http://llvm.org/. [28] “The khronos group inc.” https://www.khronos.org/. Accessed: 2021-08-01. [29] M. Horowitz, “1.1 computing’s energy problem (and what we can do about it),” in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14, IEEE, 2014. [30] “Fixed-point real numbers.” http://www.open-std.org/jtc1/sc22/wg21/docs/papers/ 2018/p0037r5.html. Accessed: 2021-08-01. [31] “Xilinx vivado design suite user guide.” https://docs.xilinx.com/v/u/2016.4-English/ ug902-vivado-high-level-synthesis. Accessed: 2021-08-01. [32] “Spir-v.” https://www.khronos.org/spir/. Accessed: 2021-08-01. [33] “Vulkan.” https://www.vulkan.org/. Accessed: 2021-08-01. [34] “Andes technology.” http://www.andestech.com/en/homepage/. Accessed: 2021-08- 01. [35] W.-L. Shih, Y.-P. You, C.-W. Huang, and J. K. Lee, “Compiler optimization for reducing leakage power in multithread bsp programs,” ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 20, no. 1, pp. 1–34, 2014. [36] “Andes has donated risc-v p-extension draft.” http://www.andestech.com/en/2019/ 12/31/a-look-back-at-the-achievements-andes-made-in-2019/. Accessed: 2021-08- 01. [37] C.-B. Kuan and J. K. Lee, “Compiler supports for vliw dsp processors with simd intrinsics,” Concurrency and Computation: Practice and Experience, vol. 24, no. 5, pp. 517– 532, 2012. [38] S.-C. Wang, L.-C. Kan, C.-L. Lee, Y.-S. Hwang, and J.-K. Lee, “Architecture and compiler support for gpus using energy-efficient affine register files,” ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 23, no. 2, pp. 1–25, 2017. [39] “Kmada in risc-v p extension proposal.” https://github.com/riscv/riscv-p-spec/blob/ master/P-ext-proposal.adoc#kmada-kmaxda. Accessed: 2021-08-01. [40] D. Sharlet, A. Kunze, S. Junkins, and D. Joshi, “Shevlin park: Implementing c++ amp with clang/llvm and opencl,” in General Meeting of LLVM Developers and Users, 2012. [41] A. Bakhoda, G. L. Yuan, W. W. Fung, H. Wong, and T. M. Aamodt, “Analyzing cuda workloads using a detailed gpu simulator,” in Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on, pp. 163–174, IEEE, 2009. [42] “AMD OpenCL Accelerated Parallel Processing (APP).” http://developer.amd.com/ tools-and-sdks/. [43] “Seven OpenCL Benchmarks for Heterogeneous System Architecture Evaluation.” http: //mtkntu.ntu.edu.tw/upload/edmfs150404031052772.pdf. [44] J. Power, J. Hestness, M. S. Orr, M. D. Hill, and D. A. Wood, “gem5-gpu: A heterogeneous cpu-gpu simulator,” IEEE Computer Architecture Letters, vol. 14, no. 1, pp. 34–36, 2015. [45] “numpy.” https://numpy.org/. Accessed: 2021-08-01. [46] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015. [47] S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, et al., “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies,” 2001. [48] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [49] “Tensorflow lite 8-bit quantization specification.” https://www.tensorflow.org/lite/ performance/quantization_spec. Accessed: 2021-08-01. [50] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integerarithmetic-only inference,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2704–2713, 2018. [51] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6869–6898, 2017. [52] M.-Y. H. Chao-Lin Lee, Jenq-Kuen Lee, “Case study: Devise quantized schedule primitives in halide to support darknet computation,” CTHPC, 2021. [53] “8-bit inference with tensorrt.” https://on-demand.gputechconf.com/gtc/2017/ presentation/s7310-8-bit-inference-with-tensorrt.pdf. Accessed: 2021-08-01. [54] D. Miyashita, E. H. Lee, and B. Murmann, “Convolutional neural networks using logarithmic data representation,” arXiv preprint arXiv:1603.01025, 2016. [55] H.-T. Kung, B. McDanel, and S. Q. Zhang, “Term quantization: furthering quantization at run time,” in SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–16, IEEE, 2020. [56] C.-L. Lee, M.-Y. Hsu, B.-S. Lu, M.-Y. Hung, and J.-K. Lee, “Experiment and enabled flow for gpgpu-sim simulators with fixed-point instructions,” Journal of Systems Architecture, vol. 111, p. 101783, 2020. [57] “Support tvm qnn flow on risc-v with simd computation.” https://discuss.tvm.apache. org/t/rfc-enable-tvm-qnn-on-risc-v-with-subword-simd-computation/7967. Accessed: 2021-08-01. [58] Y.-R. Chen, H.-H. Liao, C.-H. Chang, C.-C. Lin, C.-L. Lee, Y.-M. Chang, C.-C. Yang, and J.-K. Lee, “Experiments and optimizations for tvm on risc-v architectures with p extension,” in 2020 International Symposium on VLSI Design, Automation and Test (VLSIDAT), pp. 1–4, IEEE, 2020. |