|
[1] Bishop, C. M. and Nasrabadi, N. M. (2006). Pattern recognition and machine learning, volume 4. Springer. [2] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. (2016). Openai gym. [3] DiCiccio, T. J. and Efron, B. (1996). Bootstrap confidence intervals. Statistical science, 11(3):189–228. [4] Fang, M., Li, Y., and Cohn, T. (2017). Learning how to active learn: A deep reinforcement learning approach. arXiv:1708.02383. [5] Ho, J. and Ermon, S. (2016). Generative adversarial imitation learning. Advances in Neural Information Processing Systems(NeurIPS), 29. [6] Hong, Z.-W., Fu, T.-J., Shann, T.-Y., and Lee, C.-Y. (2020). Adversarial active exploration for inverse dynamics model learning. In Kaelbling, L. P., Kragic, D., and Sugiura, K., editors, Proceedings of the Conference on Robot Learning(CoRL), volume 100 of Proceedings of Machine Learning Research, pages 552–565. PMLR. [7] Kidambi, R., Chang, J., and Sun, W. (2021). Mobile: Model-based imitation learning from observation alone. Advances in Neural Information Processing Systems(NeurIPS), 34:28598–28611. [8] Kim, K., Sano, M., De Freitas, J., Haber, N., and Yamins, D. (2020). Active world model learning with progress curiosity. In Proceedings of the 37th International Conference on Machine Learning(ICML), pages 5306–5315. PMLR. [9] Kingma, D. and Ba, J. (2017). Adam: A method for stochastic optimization. arXiv:1412.6980. [10] Lewis, D. D. and Gale, W. A. (1994). A sequential algorithm for training text classifiers. In SIGIR’94, pages 3–12. Springer. [11] Osa, T., Pajarinen, J., Neumann, G., Bagnell, J. A., Abbeel, P., Peters, J., et al. (2018). An algorithmic perspective on imitation learning. Foundations and Trends® in Robotics, 7(1-2):1–179. [12] Pomerleau, D. A. (1988). Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1. [13] Prakash, A., Behl, A., Ohn-Bar, E., Chitta, K., and Geiger, A. (2020). Exploring data aggregation in policy learning for vision-based urban autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11763–11773. [14] Schaul, T., Quan, J., Antonoglou, I., and Silver, D. (2015). Prioritized experience replay. arXiv preprint arXiv:1511.05952. [15] Schulman, J. et al.. (2017). Proximal policy optimization algorithms. arXiv:1707.06347. [16] Settles, B. (2009). Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences. [17] Sutton, R. S. and Barto, A. G. (1998). Reinforcement learning: An introduction. MIT Press. [18] Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems(NeurIPS), 12. [19] Todorov, E., Erez, T., and Tassa, Y. (2012). Mujoco: A physics engine for model-based control. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS), pages 5026–5033. [20] Torabi, F., Warnell, G., and Stone, P. (2018a). Behavioral cloning from observation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI’18, page 4950–4957. AAAI Press. [21] Torabi, F., Warnell, G., and Stone, P. (2018b). Generative adversarial imitation from observation. arXiv preprint arXiv:1807.06158. [22] Yang, C., Ma, X., Huang, W., Sun, F., Liu, H., Huang, J., and Gan, C. (2019). Imitation learning from observations by minimizing inverse dynamics disagreement. Advances in Neural Information Processing Systems(NeurIPS), 32. [23] Yang, Y., Ma, Z., Nie, F., Chang, X., and Hauptmann, A. G. (2015). Multi-class active learning by uncertainty sampling with diversity maximization. International Journal of Computer Vision, 113(2):113–127. [24] Zhu, Z., Lin, K., Dai, B., and Zhou, J. (2020). Off-policy imitation learning from observations. Advances in Neural Information Processing Systems(NeurIPS), 33:12402–12413.
|