|
[1] M. Tan. Multi-agent reinforcement learning: Independent versus cooperative agents. In Proc. Int. Conf. on Machine Learning (ICML), page 330–337, Jun. 1993. [2] F. A. Oliehoek and C. Amato. A Concise Introduction to Decentralized POMDPs. Springer, 2016. [3] P. Sunehag et al. Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proc. Int. Conf. on Autonomous Agents and MultiAgent Systems (AAMAS), pages 2085–2087, May 2018. [4] T. Rashid et al. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proc. Int. Conf. on Machine Learning (ICML), pages 4295–4304, Jul. 2018. [5] K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In Proc. Int. Conf. on Machine Learning (ICML), pages 5887–5896, Jul. 2019. [6] M. Samvelyan et al. The StarCraft multi-agent challenge. In Proc. Int. Conf. on Autonomous Agents and MultiAgent Systems (AAMAS), pages 2186–2188, May 2019. [7] C. Guestrin, D. Koller, and R. Parr. Multiagent planning with factored MDPs. In Advances in Neural Information Processing Systems (NIPS), 2001. [8] C. Lyle, M. G. Bellemare, and P. S. Castro. A comparative analysis of ex- pected and distributional reinforcement learning. In Proc. AAAI Conf. on Artificial Intelligence (AAAI), pages 4504–4511, Feb. 2019. [9] M. G. Bellemare, W. Dabney, and R. Munos. A distributional perspective on reinforcement learning. In Proc. Int. Conf. on Machine Learning (ICML), pages 449–458, Jul. 2017. [10] W. Dabney, M. Rowland, M. G. Bellemare, and R. Munos. Distributional reinforcement learning with quantile regression. In Proc. AAAI Conf. on Artificial Intelligence (AAAI), pages 2892–2901, Feb. 2018. [11] W. Dabney, G. Ostrovski, D. Silver, and R. Munos. Implicit quantile networks for distributional reinforcement learning. In Proc. Int. Conf. on Machine Learning (ICML), pages 1096–1105, Jul. 2018. [12] M. Rowland et al. Statistics and samples in distributional reinforcement learning. In Proc. Int. Conf. on Machine Learning (ICML), pages 5528–5536, Jul. 2019. [13] D. Yang et al. Fully parameterized quantile function for distributional rein- forcement learning. In Proc. Conf. Advances in Neural Information Processing Systems (NeurIPS), pages 6190–6199, Dec. 2019. [14] J. Wang, Z. Ren, T. Liu, Y. Yu, and C. Zhang. QPLEX: Duplex dueling multi- agent Q-Learning. In Proc. Int. Conf. on Learning Representations (ICLR), May 2021. [15] W. F. Sun, C. K. Lee, and C. Y. Lee. DFAC framework: Factorizing the value function via quantile mixture for multi-agent distributional Q-Learning. In Proc. Int. Conf. on Machine Learning (ICML), pages 9945–9954, Jul. 2021. [16] F. L. Da Silva, A. H. R. Costa, and P. Stone. Distributional reinforcement learning applied to robot soccer simulation. In Workshop on Adaptive Learn- ing Agents (ALA) at AAMAS, Jun. 2019. [17] X. Lyu and C. Amato. Likelihood quantile networks for coordinating multi- agent reinforcement learning. In Proc. Int. Conf. on Autonomous Agents and MultiAgent Systems (AAMAS), pages 798–806, May 2020. [18] L. Matignon, G. Laurent, and N. Fort-Piat. Hysteretic Q-Learning: an al- gorithm for decentralized reinforcement learning in cooperative multi-agent teams. In Int. Conf. on Intelligent Robots and Systems (IROS), pages 64–69, Dec. 2007. [19] S. Omidshafiei, J. Pazis, C. Amato, J. P. How, and J. Vian. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In Proc. Int. Conf. on Machine Learning (ICML), pages 2681–2690, Aug. 2017. [20] M. Rowland et al. Temporal difference and return optimism in cooperative multi-agent reinforcement learning. In Workshop on Adaptive Learning Agents (ALA) at AAMAS, May 2021. [21] T. Rashid, G. Farquhar, B. Peng, and S. Whiteson. Weighted QMIX: Expand- ing monotonic value function factorisation. Advances in Neural Information Processing Systems (NIPS), pages 10199–10210, Dec. 2020. [22] Y. Yang, J. Hao, B. Liao, K. Shao, G. Chen, W. Liu, and H. Tang. Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939, 2020. [23] Y. Du et al. LIIR: Learning individual intrinsic reward in multi-agent rein- forcement learning. In Advances in Neural Information Processing Systems (NIPS), pages 4405–4416, Dec. 2019. [24] S. Q. Zhang, Q. Zhang, and J. Lin. Efficient communication in multi-agent reinforcement learning via variance based control. In Advances in Neural Information Processing Systems (NIPS), pages 3230–3239, Dec. 2019. [25] T. Wang, J. Wang, C. Zheng, and C. Zhang. Learning nearly decompos- able value functions via communication minimization. In Proc. Int. Conf. on Learning Representations (ICLR), Apr. 2020. [26] C. S. de Witt et al. Multi-agent common knowledge reinforcement learning. In Advances in Neural Information Processing Systems (NIPS), pages 9924– 9935, Dec. 2019. [27] T. Wang, H. Dong, V. Lesser, and C. Zhang. Multi-agent reinforcement learn- ing with emergent roles. In Proc. Int. Conf. on Machine Learning (ICML), pages 9876–9886, Jul. 2020. [28] W. Wang et al. Action semantics network: Considering the effects of actions in multiagent systems. In Proc. Int. Conf. on Learning Representations (ICLR), Apr. 2020. [29] R. Lowe et al. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems (NIPS), pages 6379–6390, Dec. 2017. [30] J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson. Coun- terfactual multi-agent policy gradients. In Proc. AAAI Conf. on Artificial Intelligence (AAAI), Feb. 2018. [31] S. Iqbal and F. Sha. Actor-attention-critic for multi-agent reinforcement learn- ing. In Proc. Int. Conf. on Machine Learning (ICML), pages 2961–2970, Jul. 2019. [32] B. Peng et al. FACMAC: Factored multi-agent centralised policy gradients. In Advances in Neural Information Processing Systems (NIPS), pages 12208– 12221, Dec. 2021. [33] J. Hu, S. A. Harding, H. Wu, and S. W. Liao. QR-MIX: Distributional value function factorisation for cooperative multi-agent reinforcement learn- ing. arXiv preprint arXiv:2009.04197, 2020. [34] W. Qiu et al. RMIX: Learning risk-sensitive policies for cooperative reinforce- ment learning agents. Advances in Neural Information Processing Systems (NIPS), pages 23049–23062, Dec. 2021. [35] V. Mnih et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, Feb. 2015. [36] J. Wang, Z. Ren, B. Han, J. Ye, and C. Zhang. Towards understanding cooperative multi-agent Q-Learning with value factorization. Advances in Neural Information Processing Systems (NIPS), Dec. 2021. [37] M. G. Bellemare, W. Dabney, and M. Rowland. Distributional Reinforcement Learning. MIT Press, 2022. http://www.distributional-rl.org. [38] M. G. Bellemare, N. Le Roux, P. S. Castro, and S. Moitra. Distributional reinforcement learning with linear function approximation. In Proc. Int. Conf. on Artificial Intelligence and Statistics (AISTATS), pages 2203–2211, Apr. 2019. [39] T. T. Nguyen, S. Gupta, and S. Venkatesh. Distributional reinforcement learning via moment matching. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Feb. 2021. [40] N. Nikolov, J. Kirschner, F. Berkenkamp, and A. Krause. Information- directed exploration for deep reinforcement learning. In Proc. Int. Conf. on Learning Representations (ICLR), May 2019. [41] S. Zhang and H. Yao. QUOTA: The quantile option architecture for rein- forcement learning. In Proc. AAAI Conf. on Artificial Intelligence (AAAI), pages 5797–5804, Feb. 2019. [42] B. Mavrin, H. Yao, L. Kong, K. Wu, and Y. Yu. Distributional reinforcement learning for efficient exploration. In Proc. Int. Conf. on Machine Learning (ICML), pages 4424–4434, Jul. 2019. [43] L. Xia. Risk-sensitive Markov decision processes with combined metrics of mean and variance. Production and Operations Management, 29(12):2808– 2827, 2020. [44] T. Schaul, D. Horgan, K. Gregor, and D. Silver. Universal value function approximators. In Proc. Int. Conf. on Machine Learning (ICML), pages 1312– 1320, Jul. 2015. [45] N. Rahaman et al. On the spectral bias of neural networks. In Proc. Int. Conf. on Machine Learning (ICML), pages 5301–5310, Jul. 2019. [46] J. Karvanen. Estimation of quantile mixtures via L-moments and trimmed L-moments. Computational Statistics & Data Analysis, pages 947–959, Nov. 2006. [47] Z. Lin et al. Distributional reward decomposition for reinforcement learning. In Advances in Neural Information Processing Systems (NIPS), pages 6212– 6221, Dec. 2019.
|