|
[1] T. Chen et al., “TVM: An Automated End-to-End Optimizing Compiler for Deep Learning.” [Online]. Available: https://arxiv.org/abs/1802.04799 [2] MLC team, “MLC-LLM.” [Online]. Available: https://github.com/mlc-ai/mlc-llm [3] R. Lai et al., “Relax: Composable Abstractions for End-to-End Dynamic Machine Learning.” 2023. [4] S. Feng et al., “TensorIR: An Abstraction for Automatic Tensorized Program Optimization,” in Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2023, pp. 804–817. [5] Google Inc., “Android NDK | Android Developers.” Accessed: Jun. 23, 2024. [Online]. Available: https://developer.android.com/ndk [6] S.-Y. Cheng, C.-P. Chung, R. Lai, and J.-K. Lee, “Application Showcases for TVM with NeuroPilot on Mobile Devices,” in Workshop Proceedings of the 51st International Conference on Parallel Processing, 2022, pp. 1–8. [7] M.-Y. Lai, C.-Y. Sung, J.-K. Lee, and M.-Y. Hung, “Enabling Android NNAPI Flow for TVM Runtime,” in Workshop Proceedings of the 49th International Conference on Parallel Processing, in ICPP Workshops '20. Edmonton, AB, Canada: Association for Computing Machinery, 2020. doi: 10.1145/3409390.3409393. [8] Z. Chen et al., “Bring Your Own Codegen to Deep Learning Compiler.” 2021. [9] J. Roesch et al., “Relay: a new IR for machine learning frameworks,” in Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, in MAPL 2018. Philadelphia, PA, USA: Association for Computing Machinery, 2018, pp. 58–68. doi: 10.1145/3211346.3211348. [10] J. Bai, F. Lu, K. Zhang, and others, “ONNX: Open Neural Network Exchange.” GitHub, 2019. [11] S. Elfwing, E. Uchibe, and K. Doya, “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning.” 2017. [12] B. Zhang and R. Sennrich, “Root Mean Square Layer Normalization.” 2019. [13] A. Q. Jiang et al., “Mistral 7B.” [Online]. Available: https://arxiv.org/abs/2310.06825 [14] J. Bai et al., “Qwen Technical Report,” arXiv preprint arXiv:2309.16609, 2023. [15] P. Zhang, G. Zeng, T. Wang, and W. Lu, “TinyLlama: An Open Source Small Language Model.” 2024. [16] H. Touvron et al., “Llama 2: Open Foundation and Fine-Tuned Chat Models.” [Online]. Available: https://arxiv.org/abs/2307.09288 [17] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520. [18] A. Howard et al., “Searching for mobilenetv3,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1314–1324. [19] M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.” [Online]. Available: https://arxiv.org/abs/1905.11946 [20] T. maintainers and contributors, “TorchVision: PyTorch's Computer Vision library.” GitHub, 2016.
|