|
[1] L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel, “Asymmetric actor critic for image-based robot learning,” arXiv preprint arXiv:1710.06542, 2017. [2] D. Ha and J. Schmidhuber, “World models,” arXiv preprint arXiv:1803.10122, 2018. [3] D. Hafner, T. P. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Learning latent dynamics for planning from pixels,” in Proceedings of the 36th International Conference on Machine Learning, ICML, 2019. [4] T. Yu, G. Thomas, L. Yu, S. Ermon, J. Y. Zou, S. Levine, C. Finn, and T. Ma, “Mopo: Model-based offline policy optimization,” in Advances in Neural Information Processing Systems 33, NeurIPS, 2020. [5] P. Shyam, W. Jaskowski, and F. Gomez, “Model-based active exploration,” in Proceedings of the 36th International Conference on Machine Learning, ICML, 2019. [6] A. Byravan, J. T. Springenberg, A. Abdolmaleki, R. Hafner, M. Neunert, T. Lampe, N. Y. Siegel, N. Heess, and M. A. Riedmiller, “Imagined value gradients: Model-based policy optimization with transferable latent dynamics models,” arXiv preprint arXiv:1910.04142, 2019. [7] R. Sekar, O. Rybkin, K. Daniilidis, P. Abbeel, D. Hafner, and D. Pathak, “Planning to explore via self-supervised world models,” in Proceedings of the 37th International Conference on Machine Learning, ICML, 2020. [8] M. Watter, J. T. Springenberg, J. Boedecker, and M. A. Riedmiller, “Embed to control: A locally linear latent dynamics model for control from raw images,” in Advances in Neural Information Processing Systems 28, NeurIPS, 2015. [9] J. Oh, S. Singh, and H. Lee, “Value prediction network,” in Advances in Neural Information Processing Systems 30, NeurIPS, 2017. [10] J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. P. Lillicrap, and D. Silver, “Mastering atari, go, chess and shogi by planning with a learned model,” arXiv preprint arXiv:1911.08265, 2019. [11] I. Antonoglou, J. Schrittwieser, S. Ozair, T. K. Hubert, and D. Silver, “Planning in stochastic environments with a learned model,” in International Conference on Learning Representations, ICLR, 2022. [12] D. Hafner, T. P. Lillicrap, J. Ba, and M. Norouzi, “Dream to control: Learning behaviors by latent imagination,” in International Conference on Learning Representations, ICLR, 2020. [13] D. Hafner, T. P. Lillicrap, M. Norouzi, and J. Ba, “Mastering atari with discrete world models,” in International Conference on Learning Representations, ICLR, 2021. [14] N. A. Hansen, H. Su, and X. Wang, “Temporal difference learning for model predictive control,” in Proceedings of the 39th International Conference on Machine Learning, ICML, 2022. [15] Y. Mu, Y. Zhuang, B. Wang, G. Zhu, W. Liu, J. Chen, P. Luo, S. Li, C. Zhang, and J. Hao, “Model-based reinforcement learning via imagination with derived memory,” in Advances in Neural Information Processing Systems 34, NeurIPS, 2021. [16] T. Wang, S. Du, A. Torralba, P. Isola, A. Zhang, and Y. Tian, “Denoised mdps: Learning world models better than the world itself,” in Proceedings of the 39th International Conference on Machine Learning, ICML, 2022. [17] C. Yu, D. Li, J. Hao, J. Wang, and N. Burgess, “Learning state representations via retracing in reinforcement learning,” in International Conference on Learning Representations, ICLR, 2022. [18] L. P. Fröhlich, M. Lefarov, M. N. Zeilinger, and F. Berkenkamp, “On-policy model errors in reinforcement learning,” in International Conference on Learning Representations, ICLR, 2022. [19] Y. Oh, J. Shin, E. Yang, and S. J. Hwang, “Model-augmented prioritized experience replay,” in International Conference on Learning Representations, ICLR, 2022. [20] A. Byravan, L. Hasenclever, P. Trochim, M. Mirza, A. D. Ialongo, Y. Tassa, J. T. Springenberg, A. Abdolmaleki, N. Heess, J. Merel, and M. A. Riedmiller, “Evaluating model-based planning and planner amortization for continuous control,” in International Confer- ence on Learning Representations, ICLR, 2022. [21] R. J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural Comput., 1989. [22] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, 1997. [23] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” in Proceedings of the 30th International Conference on Machine Learning, ICML, 2013. [24] J. Zhang, T. He, S. Sra, and A. Jadbabaie, “Why gradient clipping accelerates training: A theoretical justification for adaptivity,” in International Conference on Learning Representations, ICLR, 2020. [25] M. Okada and T. Taniguchi, “Dreaming: Model-based reinforcement learning by latent imagination without reconstruction,” in IEEE International Conference on Robotics and Automation, ICRA, 2021. [26] M. Okada and T. Taniguchi, “Dreamingv2: Reinforcement learning with discrete world models without reconstruction,” arXiv preprint arXiv:2203.00494, 2022. [27] F. Deng, I. Jang, and S. Ahn, “Dreamerpro: Reconstruction-free model-based reinforcement learning with prototypical representations,” in International Conference on Machine Learning, ICML, 2022. [28] T. Nguyen, R. Shu, T. Pham, H. Bui, and S. Ermon, “Temporal predictive coding for model-based planning in latent space,” in Proceedings of the 38th International Conference on Machine Learning, ICML, 2021. [29] Y. Tassa, Y. Doron, A. Muldal, T. Erez, Y. Li, D. de Las Casas, D. Budden, A. Abdolmaleki, J. Merel, A. Lefrancq, T. Lillicrap, and M. Riedmiller, “Deepmind control suite,” arXiv preprint arXiv:1801.00690, 2018. [30] R. Coulom, “Efficient selectivity and backup operators in monte-carlo tree search,” in International Conference on Computers and Games, 2006. [31] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. P. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, 2017. [32] T. Wang and J. Ba, “Exploring model-based planning with policy networks,” in International Conference on Learning Representations, 2020. [33] Z. I. Botev, D. P. Kroese, R. Y. Rubinstein, and P. L'Ecuyer, “The cross-entropy method for optimization,” in Handbook of statistics, vol. 31, pp. 35–59, Elsevier, 2013. [34] P. d. Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, “A tutorial on the cross-entropy method,” Annals of Operations Research, vol. 134, no. 1, pp. 19–67, 2005. [35] J. Y. Koh, H. Lee, Y. Yang, J. Baldridge, and P. Anderson, “Pathdreamer: A world model for indoor navigation,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV, 2021. [36] L. Kaiser, M. Babaeizadeh, P. Milos, B. Osinski, R. H. Campbell, K. Czechowski, D. Erhan, C. Finn, P. Kozakowski, S. Levine, R. Sepassi, G. Tucker, and H. Michalewski, “Model-based reinforcement learning for atari,” arXiv preprint arXiv:1903.00374, 2019. [37] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assignments,” in Advances in Neural Information Processing Systems 33, NeurIPS, 2020. [38] A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018. [39] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 2018. [40] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, 2015. |