|
[1] R. Stern, N. R. Sturtevant, A. Felner, S. Koenig, H. Ma, T. T. Walker, J. Li, D. Atzmon, L. Cohen, T. S. Kumar, et al., “Multi-agent pathfinding: Definitions, variants, and benchmarks,” in Twelfth Annual Symposium on Combinatorial Search, 2019. [2] S. Kumar and S. Chakravorty, “Multi-agent generalized robabilistic roadmaps: Magprm,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3747–3753, IEEE, 2012. [3] M. Čáp, P. Novák, J. Vokrínek, and M. Pěchouček, “Multi-agent rrt: sampling-based cooperative pathfinding,” in Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, pp. 1263–1264, 2013. [4] E. Prianto, M. Kim, J.-H. Park, J.-H. Bae, and J.-S. Kim, “Path planning for multi-arm manipulators using deep reinforcement learning: Soft actor–critic with hindsight experience replay,” Sensors, vol. 20, no. 20, p. 5911, 2020. [5] A. Ghadirzadeh, X. Chen, W. Yin, Z. Yi, M. Björkman, and D. Kragic, “Human-centered collaborative robots with deep reinforcement learning,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 566–571, 2020. [6] S. Huang and S. Ontañón, “Action guidance: Getting the best of sparse rewards and shaped rewards for real-time strategy games,” arXiv preprint arXiv:2010.03956, 2020. [7] K. Zhang, Z. Yang, and T. Başar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” Handbook of Reinforcement Learning and Control, pp. 321–384, 2021. [8] S. H. Semnani, H. Liu, M. Everett, A. De Ruiter, and J. P. How, “Multi-agent motion planning for dense and dynamic environments via deep reinforcement learning,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3221–3226, 2020. [9] H. Ha, J. Xu, and S. Song, “Learning a decentralized multi-arm motion planner,” in Proceedings of the 2020 Conference on Robot Learning, 2020. [10] J. B. Martín, R. Chekroun, and F. Moutarde, “Learning from demonstrations with sacr2: Soft actor-critic with reward relabeling,” in Deep RL Workshop NeurIPS 2021, 2021. [11] G. Zuo, J. Lu, and T. Pan, “Sparse reward based manipulator motion planning by using high speed learning from demonstrations,” in 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 518–523, IEEE, 2018. [12] A. Nair, B. McGrew, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Overcoming exploration in reinforcement learning with demonstrations,” in 2018 IEEE international conference on robotics and automation (ICRA), pp. 6292–6299, IEEE, 2018. [13] J. J. Kuffner and S. M. LaValle, “Rrt-connect: An efficient approach to single-query path planning,” in Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), vol. 2, pp. 995–1001, IEEE, 2000. [14] E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” 2016. [15] E. A. Hansen, D. S. Bernstein, and S. Zilberstein, “Dynamic programming for partially observable stochastic games,” in AAAI, vol. 4, pp. 709–715, 2004. [16] A. A. Neto, D. G. Macharet, and M. F. M Campos, “Multi-agent rapidly-exploring pseudorandom tree,” Journal of Intelligent & Robotic Systems, vol. 89, no. 1, pp. 69–85, 2018. [17] V. R. Desaraju and J. P. How, “Decentralized path planning for multi-agent teams in complex environments using rapidly-exploring random trees,” in 2011 IEEE International Conference on Robotics and Automation, pp. 4956–4961, IEEE, 2011. [18] H. Lee, J. Hong, and J. Jeong, “Marl-based dual reward model on segmented actions for multiple mobile robots in automated warehouse environment,” Applied Sciences, vol. 12, no. 9, p. 4703, 2022. [19] X. Lyu, Y. Xiao, B. Daley, and C. Amato, “Contrasting centralized and decentralized critics in multi-agent reinforcement learning,” in Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp. 844–852, 2021. [20] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,” Advances in neural information processing systems, vol. 30, 2017. [21] T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, et al., “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905, 2018. [22] A. Graves, “Long short-term memory,” Supervised sequence labelling with recurrent neural networks, pp. 37–45, 2012. |