|
[1] S. Han, H. Mao, and W. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” in ICLR’16, 10 2016. [2] Y.Wang,“Deeplearninginrealtime—inferenceaccelerationandcontinuoustrain- ing,” 2017. [3] G.Hinton,O.Vinyals,andJ.Dean,“Distillingtheknowledgeinaneuralnetwork,” in NIPS Deep Learning and Representation Learning Workshop, 2015. [4] R. S. Sutton, “Between mdps and semi-mdps: A framework for temporal abstrac- tion in reinforcement learning,” Artif. Intell., vol. 112, p. 181–211, Aug. 1999. [5] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” 2016. [6] Y. Tassa, Y. Doron, A. Muldal, T. Erez, Y. Li, D. de Las Casas, D. Budden, A. Ab- dolmaleki, J. Merel, A. Lefrancq, T. Lillicrap, and M. Riedmiller, “DeepMind control suite,” tech. rep., DeepMind, Jan. 2018. [7] C. Buciluundefined, R. Caruana, and A. Niculescu-Mizil, “Model compression,” in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, (New York, NY, USA), p. 535–541, Association for Computing Machinery, 2006. [8] J. Ho and S. Ermon, “Generative adversarial imitation learning,” 2016. [9] P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learn- ing,” in Proceedings of the Twenty-First International Conference on Machine Learning, ICML ’04, (New York, NY, USA), p. 1, Association for Computing Machinery, 2004. [10] C. Finn, S. Levine, and P. Abbeel, “Guided cost learning: Deep inverse optimal control via policy optimization,” in Proceedings of the 33rd International Confer- ence on International Conference on Machine Learning - Volume 48, ICML’16, p. 49–58, JMLR.org, 2016. [11] J. Fu, K. Luo, and S. Levine, “Learning robust rewards with adversarial inverse reinforcement learning,” 2017. [12] F. Codevilla, E. Santana, A. Lopez, and A. Gaidon, “Exploring the limitations of behavior cloning for autonomous driving,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9328–9337, 2019. [13] C. Florensa, Y. Duan, and P. Abbeel, “Stochastic neural networks for hierarchical reinforcement learning,” ICLR, 04 2017. [14] P.-L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” 2016. [15] O. Nachum, S. Gu, H. Lee, and S. Levine, “Data-efficient hierarchical reinforce- ment learning,” 2018. [16] A. S. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg, D. Silver, and K. Kavukcuoglu, “Feudal networks for hierarchical reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, p. 3540–3549, JMLR.org, 2017. [17] C. Tessler, S. Givony, T. Zahavy, D. J. Mankowitz, and S. Mannor, “A deep hier- archical approach to lifelong learning in minecraft,” in Proceedings of the Thirty- First AAAI Conference on Artificial Intelligence, AAAI’17, p. 1553–1561, AAAI Press, 2017. [18] J. Andreas, D. Klein, and S. Levine, “Modular multitask reinforcement learning with policy sketches,” in Proceedings of the 34th International Conference on Ma- chine Learning - Volume 70, ICML’17, p. 166–175, JMLR.org, 2017. [19] J.Harb,P.-L.Bacon,M.Klissarov,andD.Precup,“Whenwaitingisnotanoption : Learning options with a deliberation cost,” in AAAI, 2018. [20] M. Hessel, H. Soyer, L. Espeholt, W. Czarnecki, S. Schmitt, and H. van Hasselt, “Multi-task deep reinforcement learning with popart,” tech. rep., DeepMind, 2019. [21] A. Li, C. Florensa, I. Clavera, and P. Abbeel, “Sub-policy adaptation for hierarchi- cal reinforcement learning,” in ICLR, 2020. [22] V. Mnih, K. Kavukcuoglu, and D. e. a. Silver, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. [23] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” ArXiv, vol. abs/1801.01290, 2018. [24] H.V.Hasselt,“Doubleq-learning,”inAdvancesinNeuralInformationProcessing Systems 23 (J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, eds.), pp. 2613–2621, Curran Associates, Inc., 2010. [25] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. Mc- Grew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience re- play,” in Advances in Neural Information Processing Systems 30 (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), pp. 5048–5058, Curran Associates, Inc., 2017. [26] N.Cesa-Bianchi,C.Gentile,G.Lugosi,andG.Neu,“Boltzmannexplorationdone right,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, (Red Hook, NY, USA), p. 6287–6296, Curran Associates Inc., 2017. [27] E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033, Oct 2012. [28] A. Hill, A. Raffin, M. Ernestus, A. Gleave, A. Kanervisto, R. Traore, P. Dhari- wal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, and Y. Wu, “Stable baselines.” https://github.com/hill-a/stable- baselines, 2018. [29] A. Raffin, “Rl baselines zoo.” https://github.com/araffin/rl-baselines-zoo, 2018. |