|
[1] Tuomas Haarnoja, Sehoon Ha, Aurick Zhou, Jie Tan, George Tucker, Sergey Levine, “Learning to Walk via Deep Reinforcement Learning,” 於 RSS, 2018. [2] Shirin Joshi, Sulabh Kumra, Ferat Sahin, “Robotic Grasping using Deep Reinforcement Learning,” 於 CASE, 2020. [3] Zihan Fang; Yanxu Hou; Jun Li, “A Pick-and-Throw Method for Enhancing Robotic Sorting Ability via Deep Reinforcement Learning,” 於 YAC, 2021. [4] Sangduck Lee, Woonchul Ham, “Self Stabilizing Strategy in Tracking Control of Unmanned Electric Bicycle with Mass Balance,” 於 IROS, 2002. [5] 林子傑, 自我平衡電動自行車之感測融合與路徑追蹤, 國立清華大學碩士論文, 2021. [6] Wenhao Deng, Skyler Moore, Jonathan Bush, Miles Mabey, Wenlong Zhang, “Towards Automated Bicycles: Achieving Self-Balance Using Steering Control,” 於 DSCC, 2018. [7] Mark W. Spong, Peter Corke, Rogelio Lozano, “Nonlinear control of the Reaction Wheel Pendulum,” Automatica , 2001. [8] Oscar Danilo Montoya, Walter Gil-González, “Nonlinear analysis and control of a reaction wheel pendulum: Lyapunov-based approach,” Engineering Science and Technology, an International Journal, 2019. [9] Daulet Baimukashev; Nazerke Sandibay; Bexultan Rakhim; Huseyin Atakan Varol; Matteo Rubagotti, “Deep Learning-Based Approximate Optimal Control of a Reaction-Wheel-Actuated Spherical Inverted Pendulum,” 於 ASME, 2020. [10] Yunki Kim, Hyunwoo Kim, Jangmyung Lee, “Stable control of the bicycle robot on a curved path by using a reaction wheel,” Journal of Mechanical Science and Technology, 2015. [11] K. Kanjanawanishkul, “LQR and MPC controller design and comparison for a stationary self-balancing bicycle robot with a reaction wheel,” 於 Kybernetika, 2015. [12] Ngoc Kien Vu, Hong Quang Nguyen, “Design Low-Order Robust Controller for Self-Balancing Two-Wheel Vehicle,” Mathematical Problems in Engineering, 2021. [13] Vanessa Tan, John Leur Labrador, Marc Caesar Talampas, “MATA-RL: Continuous Reaction Wheel Attitude Control Using the MATA Simulation Software and Reinforcement Learning,” 於 SSC, 2021. [14] A. Sharp, Bicycles & Tricycles: A Classic Treatise on Their Design and Construction, Dover Publications, 2011. [15] F. J. W. Whipple, “The Stability of Motion of a Bicycle,” The Ouarterly Journal of Pure and Applied Mathematics, pp. 312-348, 1899. [16] E. Carvallo, “Théorie du mouvement du monocycle et de la bicyclette,” J. de L’Ecole Polytechnique, 第 冊5, pp. 119-188, 1900. [17] K.J. Åström, R.E. Klein, Anders Lennartsson, “Bicycle dynamics and control: Adapted bicycles for education and research,” IEEE Control Systems Magazine, 2005. [18] J. P. Meijaard, Jim Papadopoulos, Andy Ruina, Arend L Schwab, “Linearized dynamics equations for the balance and steer of a bicycle: A benchmark and review,” Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences, pp. 1955-1982, 2007. [19] 賴大渭, 自行車騎行動力學理論與安全性分析, 國立清華大學博士論文, 2009. [20] 魯浩天, 無人自行車之自我平衡及直線追蹤控制, 國立清華大學碩士論文, 2018. [21] M. Minsky, Theory of neural-analog reinforcement systems and its application to the brain-model problem, 1954. [22] R. Bellman, A Markovian Decision Process, 1957. [23] R. A. Howard, Dynamic Programming and Markov Processes, 1960. [24] R. S. Sutton, “Learning to predict by the methods of temporal differences,” 於 Machine Learning 3, 1988, pp. 9-44. [25] C. Watkins, Learning From Delayed Rewards, 1989. [26] G. A. Rummery, Mahesan Niranjan, On-Line Q-Learning Using Connectionist Systems, 1994. [27] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller, “Deterministic Policy Gradient Algorithms,” 於 ICML, 2014. [28] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller, “Playing Atari with Deep Reinforcement Learning,” arXiv, 2013. [29] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra, “Continuous control with deep reinforcement learning,” arXiv, 2015. [30] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, “Proximal Policy Optimization Algorithms,” arXiv, 2017. [31] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” 於 ICML, 2018. [32] Scott Fujimoto, Herke van Hoof, David Meger, “Addressing Function Approximation Error in Actor-Critic Methods,” 於 ICML, 2018. [33] Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine, “Soft Actor-Critic Algorithms and Applications,” arXiv, 2019. [34] Gonzalo Belascuen, Nahuel Aguilar, “Design, Modeling and Control of a Reaction Wheel Balanced Inverted Pendulum,” 於 ARGENCON, 2018. [35] Cheng-shuo Ying, Andy H.F. Chow, Kwai-Sang Chin, “An actor-critic deep reinforcement learning approach for metro train scheduling with rolling stock circulation under stochastic demand,” Transportation Research Part B: Methodological, 第 冊140, pp. 210-235, 2020. [36] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” arXiv, 2018. |