|
[1] Ross Girshick et al. “Rich feature hierarchies for accurate object detection and semantic segmentation”. Proceedings of the IEEE conference on computer vision and pattern recognition. 2014, pp. 580–587. [2] Jeff Donahue et al. “Decaf: A deep convolutional activation feature for generic visual recognition”. International conference on machine learning. PMLR. 2014, pp. 647–655. [3] Jacob Devlin et al. “Bert: Pre-training of deep bidirectional transformers for language understanding”. arXiv preprint arXiv:1810.04805 (2018). [4] Zhenzhong Lan et al. “Albert: A lite bert for self-supervised learning of language representations”. arXiv preprint arXiv:1909.11942 (2019). [5] Yinhan Liu et al. “Roberta: A robustly optimized bert pretraining approach”. arXiv preprint arXiv:1907.11692 (2019). [6] Tong Qin and Shaojie Shen. “Online temporal calibration for monocular visualinertial systems”. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE. 2018, pp. 3662–3669. [7] Jiyang Gao et al. “Vectornet: Encoding hd maps and agent dynamics from vectorized representation”. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, pp. 11525–11533. [8] Chengxi Li et al. “Learning 3d-aware egocentric spatial-temporal interaction via graph convolutional networks”. 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE. 2020, pp. 8418–8424. [9] Mu Li et al. “Scaling distributed machine learning with the parameter server”. 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 2014, pp. 583–598. [10] Mu Li et al. “Communication efficient distributed machine learning with the parameter server”. Advances in Neural Information Processing Systems 27 (2014), pp. 19–27. [11] Andrew Gibiansky. “Bringing HPC Techniques to Deep Learning”. 2017. url: https://andrew.gibiansky.com/blog/machine-learning/baidu-allreduce/. [12] Priya Goyal et al. “Accurate, large minibatch sgd: Training imagenet in 1 hour”. arXiv preprint arXiv:1706.02677 (2017). [13] “Kubernetes”. url: https://kubernetes.io/. [14] Yidi Wu et al. “Elastic Deep Learning in Multi-Tenant GPU Clusters”. IEEE Transactions on Parallel and Distributed Systems (2021), pp. 1–1. doi: 10 . 1109/TPDS.2021.3064966. 53 [15] Vaibhav Saxena et al. “Effective Elastic Scaling of Deep Learning Workloads”. 2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). 2020, pp. 1–8. doi: 10.1109/MASCOTS50786.2020.9285954. [16] Changho Hwang et al. “Elastic Resource Sharing for Distributed Deep Learning”. 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). USENIX Association, Apr. 2021, pp. 721–739. isbn: 978-1-939133- 21-2. url: https://www.usenix.org/conference/nsdi21/presentation/ hwang. [17] Myeongjae Jeon et al. “Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads”. 2019 USENIX Annual Technical Conference (USENIX ATC 19). Renton, WA: USENIX Association, July 2019, pp. 947– 960. isbn: 978-1-939133-03-8. url: https://www.usenix.org/conference/ atc19/presentation/jeon. [18] Travis Addair, Xu Ning, and Richard Liaw. “Elastic Deep Learning with Horovod on Ray”. 2021. url: https://eng.uber.com/horovod-ray/. [19] Yanghua Peng et al. “Optimus: an efficient dynamic resource scheduler for deep learning clusters”. Proceedings of the Thirteenth EuroSys Conference. 2018, pp. 1–14. [20] Jingoo Han et al. “Marble: A multi-gpu aware job scheduler for deep learning on hpc systems”. 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). IEEE. 2020, pp. 272–281. [21] Kshiteej Mahajan et al. “Themis: Fair and efficient GPU cluster scheduling”. 17th USENIX Symposium on Networked Systems Design and Implementation NSDI 20). 2020, pp. 289–304. [22] Shaoqi Wang et al. “An Efficient and Non-Intrusive GPU Scheduling Framework for Deep Learning Training Systems”. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 2020, pp. 1–13. doi: 10.1109/SC41405.2020.00094. [23] Juncheng Gu et al. “Tiresias: A GPU Cluster Manager for Distributed Deep Learning”. 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). Boston, MA: USENIX Association, Feb. 2019, pp. 485– 500. isbn: 978-1-931971-49-2. url: https://www.usenix.org/conference/ nsdi19/presentation/gu. [24] KR Jayaram et al. “Ffdl: A flexible multi-tenant deep learning platform”. Proceedings of the 20th International Middleware Conference. 2019, pp. 82–95. [25] Wencong Xiao et al. “Gandiva: Introspective Cluster Scheduling for Deep Learning”. 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). Carlsbad, CA: USENIX Association, Oct. 2018, pp. 595– 610. isbn: 978-1-939133-08-3. url: https://www.usenix.org/conference/ osdi18/presentation/xiao. [26] Marcelo Amaral et al. “Topology-aware gpu scheduling for learning workloads in cloud environments”. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2017, pp. 1–12. 54 [27] Alexander Sergeev and Mike Del Balso. “Horovod: fast and easy distributed deep learning in TensorFlow”. arXiv preprint arXiv:1802.05799 (2018). [28] Martín Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org. 2015. url: https : / / www.tensorflow.org/. [29] Francois Chollet et al. Keras. 2015. url: https://github.com/fchollet/ keras. [30] Adam Paszke et al. “PyTorch: An Imperative Style, High-Performance Deep Learning Library”. Advances in Neural Information Processing Systems 32. Ed. by H. Wallach et al. Curran Associates, Inc., 2019, pp. 8024–8035. url: http: //papers.neurips.cc/paper/9015-pytorch-an-imperative-style-highperformance-deep-learning-library.pdf. [31] Tianqi Chen et al. “Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems”. arXiv preprint arXiv:1512.01274 (2015). [32] “Kubeflow: The Machine Learning Toolkit for Kubernetes”. 2021. url: https: //www.kubeflow.org/. [33] Ou Rong et al. “MPI Operator in Kubeflow”. 2020. url: https://github.com/ kubeflow/mpi-operator. [34] Yixin Bao et al. “Online job scheduling in distributed machine learning clusters”. IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE. 2018, pp. 495–503. [35] Chan-Yi Lin, Ting-An Yeh, and Jerry Chou. “DRAGON: A Dynamic Scheduling and Scaling Controller for Managing Distributed Deep Learning Jobs in Kubernetes Cluster.” CLOSER. 2019, pp. 569–577. [36] Philipp Moritz et al. “Ray: A distributed framework for emerging AI applications”. 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 2018, pp. 561–577. [37] “Open Platform for AI (OpenPAI)”. url: https://github.com/microsoft/ pai. [38] Mourad Mourafiq. Polyaxon: Cloud native machine learning automation platform. Web page. 2017. url: https://github.com/polyaxon/polyaxon. [39] “Apache Submarine: Cloud Native Machine Learning Platform”. url: https: //submarine.apache.org/. [40] Yanjun Ma et al. “PaddlePaddle: An open-source deep learning platform from industrial practice”. Frontiers of Data and Domputing 1.1 (2019), pp. 105–115. [41] Michael Mui et al. “Elastic Distributed Training with XGBoost on Ray”. 2021. url: https://eng.uber.com/elastic-xgboost-ray/. [42] Richard Liaw et al. “Hypersched: Dynamic resource reallocation for model development on a deadline”. Proceedings of the ACM Symposium on Cloud Computing. 2019, pp. 61–73. [43] Johnu George et al. A Scalable and Cloud-Native Hyperparameter Tuning System. 2020. arXiv: 2006.02085 [cs.DC]. 55 [44] Hanyu Zhao et al. “Hived: sharing a GPU cluster for deep learning with guarantees”. 14th USENIX symposium on operating systems design and implementation (OSDI 20). 2020, pp. 515–532. [45] “Apache Hadoop”. url: https://hadoop.apache.org/. [46] “Amazon Elastic Block Store: Easy to use, high performance block storage at any scale”. url: https://aws.amazon.com/ebs/. [47] “Azure Disk Storage: High-performance, highly durable block storage for Azure Virtual Machines”. url: https://azure.microsoft.com/en-us/services/ storage/disks/. [48] “Google Compute Engine (GCE) Persistent Disk: Reliable, high-performance block storage for virtual machine instances”. url: https://cloud.google. com/persistent-disk. [49] “Prometheus”. url: https://prometheus.io/. [50] Zhaoyun Chen et al. “Deep Learning Research and Development Platform: Characterizing and Scheduling with QoS Guarantees on GPU Clusters”. IEEE Transactions on Parallel and Distributed Systems 31.1 (2019), pp. 34–50. [51] Isabelle Guyon et al. “Analysis of the AutoML Challenge series 2015-2018”. AutoML. Springer series on Challenges in Machine Learning. 2019. url: https: //www.automl.org/wp-content/uploads/2018/09/chapter10-challenge. pdf. [52] Harold W Kuhn. “The Hungarian method for the assignment problem”. Naval research logistics quarterly 2.1-2 (1955), pp. 83–97. [53] James Munkres. “Algorithms for the assignment and transportation problems”. Journal of the society for industrial and applied mathematics 5.1 (1957), pp. 32– 38. [54] Haibin Zhu, MengChu Zhou, and Rob Alkins. “Group role assignment via a Kuhn–Munkres algorithm-based solution”. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 42.3 (2011), pp. 739–750. [55] “MongoDB”. url: https://www.mongodb.com/. [56] David Kirk. “NVIDIA cuda software and gpu parallel computing architecture”. Proceedings of the 6th International Symposium on Memory Management, ISMM 2007, Montreal, Quebec, Canada, October 21-22, 2007. Ed. by Greg Morrisett and Mooly Sagiv. ACM, 2007, pp. 103–104. doi: 10.1145/1296907.1296909. url: https://doi.org/10.1145/1296907.1296909. [57] Sharan Chetlur et al. “cuDNN: Efficient primitives for deep learning”. arXiv preprint arXiv:1410.0759 (2014). [58] Ammar Ahmad Awan et al. “Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning”. Proceedings of the 23rd European MPI Users’ Group Meeting. 2016, pp. 15–22. [59] Li Deng. “The mnist database of handwritten digit images for machine learning research”. IEEE Signal Processing Magazine 29.6 (2012), pp. 141–142. [60] Karen Simonyan and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition”. arXiv preprint arXiv:1409.1556 (2014). 56 [61] Alex Krizhevsky. Learning multiple layers of features from tiny images. Tech. rep. 2009. [62] Kaiming He et al. “Deep residual learning for image recognition”. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 770– 778. [63] Christian Szegedy et al. “Rethinking the inception architecture for computer vision”. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 2818–2826. [64] Ashish Vaswani et al. “Attention is all you need”. Advances in neural information processing systems. 2017, pp. 5998–6008. [65] “Tab-delimited Bilingual Sentence Pairs”. url: https : / / www . manythings . org/anki/.
|