|
Bibliography [1] H. Liu and P. Abbeel, “Aps: Active pretraining with successor features,” in International Conference on Machine Learning, pp. 6736–6747, PMLR, 2021. [2] H. Liu and P. Abbeel, “Behavior from the void: Unsupervised active pretraining,” Advances in Neural Information Processing Systems, vol. 34, pp. 18459–18473, 2021. [3] S. Hansen, W. Dabney, A. Barreto, T. Van de Wiele, D. Warde-Farley, and V. Mnih, “Fast task inference with variational intrinsic successor features,” arXiv preprint arXiv:1906.05030, 2019. [4] M. Laskin, D. Yarats, H. Liu, K. Lee, A. Zhan, K. Lu, C. Cang, L. Pinto, and P. Abbeel, “Urlb: Unsupervised reinforcement learning benchmark,” arXiv preprint arXiv:2110.15191, 2021. [5] A. E. Sallab, M. Abdou, E. Perot, and S. Yogamani, “Deep reinforcement learning framework for autonomous driving,” arXiv preprint arXiv:1704.02532, 2017. [6] X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne, “Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,” ACM Transactions On Graphics (TOG), vol. 37, no. 4, pp. 1–14, 2018. 28 [7] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013. [8] D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-driven exploration by self-supervised prediction,” in International Conference on Machine Learning, pp. 2778–2787, PMLR, 2017. [9] S. Park, J. Choi, J. Kim, H. Lee, and G. Kim, “Lipschitz-constrained unsupervised skill discovery,” in International Conference on Learning Representations, 2021. [10] M. Laskin, H. Liu, X. B. Peng, D. Yarats, A. Rajeswaran, and P. Abbeel, “Unsupervised reinforcement learning with contrastive intrinsic control,” Advances in Neural Information Processing Systems, vol. 35, pp. 34478–34491, 2022. [11] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014. [12] D. Barber and F. Agakov, “The im algorithm: a variational approach to information maximization,” Advances in Neural Information Processing Systems, vol. 16, no. 320, p. 201, 2004. [13] K. Gregor, D. J. Rezende, and D. Wierstra, “Variational intrinsic control,” arXiv preprint arXiv:1611.07507, 2016. [14] J. Achiam, H. Edwards, D. Amodei, and P. Abbeel, “Variational option discovery algorithms,” arXiv preprint arXiv:1807.10299, 2018. 29 [15] B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine, “Diversity is all you need: Learning skills without a reward function,” arXiv preprint arXiv:1802.06070, 2018. [16] K. Zeng, Q. Zhang, B. Chen, B. Liang, and J. Yang, “Apd: Learning diverse behaviors for reinforcement learning through unsupervised active pretraining,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 12251– 12258, 2022. [17] V. Campos, A. Trott, C. Xiong, R. Socher, X. Gir´o-i Nieto, and J. Torres, “Explore, discover and learn: Unsupervised discovery of state-covering skills,” in International Conference on Machine Learning, pp. 1317–1327, PMLR, 2020. [18] R. Yang, C. Bai, H. Guo, S. Li, B. Zhao, Z. Wang, P. Liu, and X. Li, “Behavior contrastive learning for unsupervised skill discovery,” arXiv preprint arXiv:2305.04477, 2023. [19] Y. Yuan, J. Hao, F. Ni, Y. Mu, Y. Zheng, Y. Hu, J. Liu, Y. Chen, and C. Fan, “Euclid: Towards efficient unsupervised reinforcement learning with multi-choice dynamics model,” arXiv preprint arXiv:2210.00498, 2022. [20] P. Dayan, “Improving generalization for temporal difference learning: The successor representation,” Neural computation, vol. 5, no. 4, pp. 613–624, 1993. [21] T. D. Kulkarni, A. Saeedi, S. Gautam, and S. J. Gershman, “Deep successor reinforcement learning,” arXiv preprint arXiv:1606.02396, 2016. [22] A. Barreto, W. Dabney, R. Munos, J. J. Hunt, T. Schaul, H. P. van Hasselt, and D. Silver, “Successor features for transfer in reinforcement learning,” Advances in Neural Information Processing Systems, vol. 30, 2017. 30 [23] L. Lehnert and M. L. Littman, “Successor features combine elements of modelfree and model-based reinforcement learning,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 8030–8082, 2020. [24] D. Borsa, A. Barreto, J. Quan, D. Mankowitz, R. Munos, H. Van Hasselt, D. Silver, and T. Schaul, “Universal successor features approximators,” arXiv preprint arXiv:1812.07626, 2018. [25] C. Hoang, S. Sohn, J. Choi, W. Carvalho, and H. Lee, “Successor feature landmarks for long-horizon goal-conditioned reinforcement learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 26963–26975, 2021. [26] R. Ramesh, M. Tomar, and B. Ravindran, “Successor options: An option discovery framework for reinforcement learning,” arXiv preprint arXiv:1905.05731, 2019. [27] M. Mozifian, D. Fox, D. Meger, F. Ramos, and A. Garg, “Generalizing successor features to continuous domains for multi-task learning,” 2021. [28] A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018. [29] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015. [30] J. Kim, S. Park, and G. Kim, “Unsupervised skill discovery with bottleneck option learning,” arXiv preprint arXiv:2106.14305, 2021. [31] Y. Burda, H. Edwards, A. Storkey, and O. Klimov, “Exploration by random network distillation,” arXiv preprint arXiv:1810.12894, 2018. 31 [32] J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “Highdimensional continuous control using generalized advantage estimation,” arXiv preprint arXiv:1506.02438, 2015. [33] T. Rajapakshe, R. Rana, S. Latif, S. Khalifa, and B. W. Schuller, “Pretraining in deep reinforcement learning for automatic speech recognition,” arXiv preprint arXiv:1910.11256, 2019. [34] A. Touati, J. Rapin, and Y. Ollivier, “Does zero-shot reinforcement learning exist?,” arXiv preprint arXiv:2209.14935, 2022. [35] A. Touati and Y. Ollivier, “Learning one representation to optimize all rewards,” Advances in Neural Information Processing Systems, vol. 34, pp. 13– 23, 2021. |