|
References [1] Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nico- las Heess, and Martin Riedmiller. Maximum a Posteriori Policy Optimisation. In Proceedings of the International Conference on Learning Representations (ICLR), 2018. [2] Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer Normalization. ArXiv, abs/1607.06450, 2016. [3] Chen-Hao Chao, Wei-Fang Sun, Yen-Chang Hsu, Zsolt Kira, and Chun-Yi Lee. Training Energy-Based Normalizing Flow with Score-Matching Objectives. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 2023. [4] Junior Costa de Jesus, Victor Augusto Kich, Alisson Henrique Kolling, Ri- cardo Bedin Grando, Marco Antonio de Souza Leite Cuadros, and Daniel Fernando Tello Gamarra. Soft Actor-Critic for Navigation of Mobile Robots. Journal of Intelligent and Robotic Systems, 2021. [5] Michel Dekking. A Modern Introduction to Probability and Statistics. 2007. [6] Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Non-linear Inde- pendent Components Estimation. Workshop at the International Conference on Learning Representations (ICLR), 2015. [7] Laurent Dinh, Jascha Narain Sohl-Dickstein, and Samy Bengio. Density Estimation using Real NVP. Proceedings of the International Conference on Learning Representations (ICLR), 2017. [8] Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Neu- ral Spline Flows. Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 2019. [9] Benjamin Eysenbach and Sergey Levine. Maximum Entropy RL Provably Solves Some Robust RL Problems. In Proceedings of the International Confer- ence on Learning Representations (ICLR), 2022. [10] Roy Fox, Ari Pakman, and Naftali Tishby. Taming the Noise in Reinforcement Learning via Soft Updates. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2016. [11] Scott Fujimoto, Herke van Hoof, and David Meger. Addressing Function Ap- proximation Error in Actor-Critic Methods. In Proceedings of the International Conference on Machine Learning (ICML), 2018. [12] Mathieu Germain, Karol Gregor, Iain Murray, and H. Larochelle. MADE: Masked Autoencoder for Distribution Estimation. 2015. [13] Michael B. Giles. Multilevel Monte Carlo Methods. Acta Numerica, 24:259 – 328, 2013. [14] Will Grathwohl, Kuan-Chieh Jackson Wang, Jörn-Henrik Jacobsen, David Kristjanson Duvenaud, Mohammad Norouzi, and Kevin Swersky. Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One. Proceedings of the International Conference on Learning Representations (ICLR), 2019. [15] Will Grathwohl, Kuan-Chieh Jackson Wang, Jörn-Henrik Jacobsen, David Kristjanson Duvenaud, and Richard S. Zemel. Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling. In Proceedings of the International Conference on Machine Learning, 2020. [16] L. Gresele, G. Fissore, A. Javaloy, B. Schölkopf, and A. Hyvärinen. Relative Gradient Optimization of the Jacobian Term in Unsupervised Deep Learning. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2020. [17] Tuomas Haarnoja, Kristian Hartikainen, P. Abbeel, and Sergey Levine. Latent Space Policies for Hierarchical Reinforcement Learning. In Proceedings of the International Conference on Machine Learning (ICML), 2018. [18] Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, and Sergey Levine. Rein- forcement Learning with Deep Energy-Based Policies. In Proceedings of the International Conference on Machine Learning (ICML), 2017. [19] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the International Conference on Machine Learning (ICML), 2017. [20] Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, G. Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, P. Abbeel, and Sergey Levine. Soft Actor-Critic Algorithms and Applications. ArXiv, abs/1812.05905, 2018. [21] Moritz Hardt, Benjamin Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. 2015. [22] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. ArXiv, abs/1207.0580, 2012. [23] Emiel Hoogeboom, Rianne van den Berg, and Max Welling. Emerging Con- volutions for Generative Normalizing Flows. Proceedings of the International Conference on Machine Learning (ICML), 2019. [24] Shengyi Huang, Rousslan Fernand Julien Dossa, Chang Ye, Jeff Braga, Dipam Chakraborty, Kinal Mehta, and João G.M. Araújo. CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms. Jour- nal of Machine Learning Research (JMLR), 23(274):1–18, 2022. [25] A. Hyvärinen. Estimation of Non-Normalized Statistical Models by Score Matching. Journal of Machine Learning Research (JMLR), 2005. [26] A. Hyvärinen and E. Oja. Independent Component Analysis: Algorithms and Applications. Neural Networks: the Official Journal of the International Neural Network Society, 13 4-5:411–30, 2000. [27] Malvin H. Kalos and Paula A. Whitlock. Monte Carlo methods. Vol. 1: basics. Wiley-Interscience, 1986. [28] H J Kappen. Path Integrals and Symmetry Breaking for Optimal Control Theory. Journal of Statistical Mechanics: Theory and Experiment, 2005. [29] T. Anderson Keller, Jorn W. T. Peters, Priyank Jaini, Emiel Hoogeboom, Patrick Forr’e, and Max Welling. Self Normalizing Flows. In Proceedings of the International Conference on Machine Learning (ICML), 2020. [30] Diederik Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR), 2015. [31] Diederik P. Kingma and Prafulla Dhariwal. Glow: Generative Flow with Invertible 1x1 Convolutions. Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 2018. [32] Diederik P. Kingma, Tim Salimans, and Max Welling. Improved Variational Inference with Inverse Autoregressive Flow. Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 2016. [33] Diederik P. Kingma and Max Welling. Auto-Encoding Variational Bayes. 2013. [34] Yann LeCun, Sumit Chopra, Raia Hadsell, Aurelio Ranzato, and Fu Jie Huang. A Tutorial on Energy-Based Learning. 2006. [35] Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, and Songhwai Oh. Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning. ArXiv, abs/1902.00137, 2019. [36] Xiang Li, Wenhao Yang, Jiadong Liang, Zhihua Zhang, and Michael I. Jordan. A Statistical Analysis of Polyak-Ruppert Averaged Q-Learning. In Proceed- ings of the International Conference on Artificial Intelligence and Statistics (AISTAT), 2021. [37] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Man- fred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. Proceedings of the International Conference on Learning Representations (ICLR), 2016. [38] Qiang Liu and Dilin Wang. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 2016. [39] You Lu and Bert Huang. Woodbury Transformations for Deep Generative Flows. Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 2020. [40] Xuezhe Ma and Eduard H. Hovy. MaCow: Masked Convolutional Generative Flow. Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 2019. [41] Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, N. Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning. Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS) Dataset and Benchmark Track, 2021. [42] Bogdan Mazoure, Thang Doan, Audrey Durand, R Devon Hjelm, and Joelle Pineau. Leveraging Exploration in Off-policy Algorithms via Normalizing Flows. In Proceedings of the Conference on Robot Learning (CoRL), 2019. [43] Chenlin Meng, Linqi Zhou, Kristy Choi, Tri Dao, and Stefano Ermon. Butter- flyFlow: Building Invertible Layers with Butterfly Matrices. Proceedings of the International Conference on Machine Learning (ICML), 2022. [44] Safa Messaoud, Billel Mokeddem, Zhenghai Xue, Linsey Pang, Bo An, Haipeng Chen, and Sanjay Chawla. S2AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic. In Proceedings of the International Conference on Learning Representations (ICLR), 2024. [45] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Ve- ness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Kirkeby Fidjeland, Georg Ostrovski, Stig Petersen, Charlie Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level Control through Deep Reinforcement Learning. Nature, 518:529–533, 2015. [46] Thomas Müller, Brian McWilliams, Fabrice Rousselle, Markus H. Gross, and Jan Novák. Neural Importance Sampling. ACM Transactions on Graphics (TOG), 2018. [47] Iman Nematollahi, Erick Rosete-Beas, Adrian Roefer, Tim Welschehold, Abhi- nav Valada, and Wolfram Burgard. Robot Skill Adaptation via Soft Actor- Critic Gaussian Mixture Models. Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2022. [48] Brendan O’Donoghue, Rémi Munos, Koray Kavukcuoglu, and Volodymyr Mnih. PGQ: Combining Policy Gradient and Q-learning. In Proceedings of the International Conference on Learning Representations (ICLR), 2017. [49] George Papamakarios, Iain Murray, and Theo Pavlakou. Masked Autoregressive Flow for Density Estimation. Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 2017. [50] George Papamakarios, Eric T. Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing Flows for Probabilistic Modeling and Inference. Journal of Machine Learning Research (JMLR), 2019. [51] Kwan-Woo Park, MyeongSeop Kim, Jung-Su Kim, and Jae-Han Park. Path Planning for Multi-Arm Manipulators Using Soft Actor-Critic Algorithm with Position Prediction of Moving Obstacles via LSTM. Applied Sciences, 2022. [52] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. 2019. [53] Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., 1994. [54] Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research (JMLR), 22(268):1–8, 2021. [55] Prajit Ramachandran, Barret Zoph, and Quoc V Le. Searching for Activation Functions. arXiv:1710.05941, 2017. [56] Konrad Rawlik, Marc Toussaint, and Sethu Vijayakumar. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2012. [57] Gareth O. Roberts and Jeffrey S. Rosenthal. Optimal scaling of discrete approximations to Langevin diffusions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 1998. [58] Gareth O. Roberts and Richard L. Tweedie. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, 1996. [59] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal Policy Optimization Algorithms. ArXiv, abs/1707.06347, 2017. [60] Antonio Serrano-Muñoz, Dimitrios Chrysostomou, Simon Bøgh, and Nestor Arana-Arexolaleiba. skrl: Modular and Flexible Library for Reinforcement Learning. Journal of Machine Learning Research (JMLR), 24(254):1–9, 2023. [61] Wenjie Shi, Shiji Song, and Cheng Wu. Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2019. [62] Vincent Stimper, David Liu, Andrew Campbell, Vincent Berenz, Lukas Ryll, Bernhard Schölkopf, and José Miguel Hernández-Lobato. normflows: A Py- Torch Package for Normalizing Flows. Journal of Open Source Software, 8(86):5361, 2023. [63] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Intro- duction. A Bradford Book, 2018. [64] Emanuel Todorov, Tom Erez, and Yuval Tassa. MuJoCo: A physics engine for model-based control. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS), 2012. [65] Surya T. Tokdar and Robert E. Kass. Importance Sampling: A Review. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 2010. [66] Marc Toussaint. Robot Trajectory Optimization using Approximate Inference. In Proceedings of the International Conference on Machine Learning (ICML), 2009. [67] Mark Towers, Jordan K. Terry, Ariel Kwiatkowski, John U. Balis, Gianluca de Cola, Tristan Deleu, Manuel Goulão, Andreas Kallinteris, Arjun KG, Markus Krimmel, Rodrigo Perez-Vicente, Andrea Pierré, Sander Schulhoff, Jun Jet Tai, Andrew Tan Jin Shen, and Omar G. Younis. Gymnasium, 2023. [68] Dilin Wang and Qiang Liu. Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning. In Proceedings of the International Conference on Learning Representations (ICLR), 2016. [69] Patrick Nadeem Ward, Ariella Smofsky, and A. Bose. Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies. 2019. [70] Max Welling and Yee Whye Teh. Bayesian Learning via Stochastic Gradient Langevin Dynamics. In Proceedings of the International Conference on Machine Learning (ICML), 2011. [71] Denis Yarats, Amy Zhang, Ilya Kostrikov, Brandon Amos, Joelle Pineau, and Rob Fergus. Improving sample efficiency in model-free reinforcement learning from images. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021. [72] Dinghuai Zhang, Aaron Courville, Yoshua Bengio, Qinqing Zheng, Amy Zhang, and Ricky T. Q. Chen. Latent State Marginalization as a Low-cost Approach to Improving Exploration. In Proceedings of the International Conference on Learning Representations (ICLR), 2023. [73] Brian Ziebart, Andrew Maas, J. Bagnell, and Anind Dey. Maximum Entropy Inverse Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2008. [74] Brian D. Ziebart. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy. PhD thesis, USA, 2010.
|