|
[1] Peter Fletcher, Hughes Hoyle, and C. Wayne Patty. Foundations of Discrete Mathematics. PWS-KENT Publishing Company, 1991. [2] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative ad- versarial networks. Advances in neural information processing systems, 27, 2014. [3] David Ha, Andrew M. Dai, and Quoc V. Le. Hypernetworks. CoRR, abs/1609.09106, 2016. [4] Landon Kraemer and Bikramjit Banerjee. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, 2016. [5] Michael L. Littman. Friend-or-foe Q-learning in general sum games. In ICML’01, pages 322 – 328, 2001. [6] Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017. [7] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing Atari with deep reinforcement learning. CoRR, abs/1312.5602, 2013. [8] Frans A Oliehoek, Christopher Amato, et al. A concise introduction to de- centralized POMDPs, volume 1. Springer, 2016. [9] Frans A Oliehoek, Matthijs TJ Spaan, and Nikos Vlassis. Optimal and ap- proximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research, 32:289–353, 2008. [10] Tabish Rashid, Mikayel Samvelyan, Christian Schr ̈oder de Witt, Gregory Far- quhar, Jakob N. Foerster, and Shimon Whiteson. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. CoRR, abs/1803.11485, 2018. [11] G. Rummery and Mahesan Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, 11 1994. [12] Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani. Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 29(19):70–76, January 2017. [13] Mikayel Samvelyan, Tabish Rashid, Christian Schr ̈oder de Witt, Gregory Far- quhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob N. Foerster, and Shimon Whiteson. The starcraft multi-agent challenge. CoRR, abs/1902.04043, 2019. [14] Yoav Shoham and Kevin Leyton-Brown. Multiagent systems: algorithmic, game-theoretic, and logical foundations. Cambridge University Press, New York City, United States, December 2008. [15] David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, Jan 2016. [16] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game of Go without human knowledge. Nature, 550(7676):354–359, Oct 2017. [17] Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Hostallero, and Yung Yi. QTRAN: learning to factorize with transformation for cooperative multi- agent reinforcement learning. CoRR, abs/1905.05408, 2019. [18] Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vin ́ıcius Flores Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning. CoRR, abs/1706.05296, 2017. [19] Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3(1):9–44, Aug 1988. [20] Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learn- ing, pages 330–337, 1993. [21] J. v. Neumann. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100(1):295–320, Dec 1928. [22] Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learn- ing, 8(3):279–292, May 1992. [23] Hao Ye, Geoffrey Ye Li, and Biing-Hwang Fred Juang. Deep reinforcement learning based resource allocation for V2V communications. IEEE Transac- tions on Vehicular Technology, 68(4):3163–3173, 2019.
|