帳號:guest(3.144.102.156)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳慶鴻
作者(外文):Chen, Ching-Hung
論文名稱(中文):應用深度強化學習於反作用輪自行車之平衡控制
論文名稱(外文):Balancing Control of a Reaction Wheel Bicycle Using Deep Reinforcement Learning
指導教授(中文):宋震國
胡敏君
指導教授(外文):Sung, Cheng-Kuo
Hu, Min-Chun
口試委員(中文):董必正
邱昱仁
口試委員(外文):Tung, Pi-Cheng
Chiu, Yu-Jen
學位類別:碩士
校院名稱:國立清華大學
系所名稱:動力機械工程學系
學號:109030501
出版年(民國):111
畢業學年度:110
語文別:中文
論文頁數:65
中文關鍵詞:自駕自行車反作用輪倒單擺深度強化學習ROS
外文關鍵詞:Self-Driving BicycleReaction Wheel Inverted PendulumDeep Reinforcement LearningROS
相關次數:
  • 推薦推薦:0
  • 點閱點閱:318
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
有鑑於自駕車盛行,自駕自行車也有許多研究。自駕自行車除了能提供方便快速的貨物運送之外,也能提供共享自行車「召之即來,揮之即去」的服務,另外也能實現跟車功能,增添日常生活中的方便性。此外,深度強化學習從AlphaGO打敗世界圍棋好手以來,不斷致力於更適合實體連續控制的問題,例如機械手臂之夾取、投擲,以及多臂機器人之爬行、雙足機器人之行走等等。
本研究應用深度強化學習於實際大小的自駕自行車之控制問題,其優勢在於學習演算法不依賴物理模型,省略了參數辨識與後續調參過程。另外無模型的特點也避開了自行車模型非線性的問題。此外,大部分的反作用輪自行車平衡實驗皆為小型模型,本研究也探討實際大小自行車所面臨的問題,即更為嚴苛的平衡條件。
由於自行車為欠驅動平衡系統,需要透過控制來維持直立。本文採用反作用輪控制方法,將自行車視為倒單擺,藉由反作用力矩維持自行車之平衡。本文使用Pytorch深度學習架構以及參考OpenAI Gym所設計之倒單擺環境進行模擬訓練,比較不同演算法與多種參數調變後之訓練效果,套用最適合實體訓練的演算法與參數至實車訓練上。實車設計則是藉由LQR模擬來找出最適合控制的物理參數,並將其實現在實體自行車上。訊號溝通方面採用Linux中的ROS架構來處理本體和IMU、馬達之間的訊號傳遞。
Self-driving cars are nowadays flourishing, meanwhile, self-driving bicycles are also under research. Self-driving bicycle can provide efficient delivery service, “on call” service for shared bicycles, and following ability. On the other hand, from AlphaGO beat human GO master on, the method has been improved to meet the real-world continuous control problems, such as robotic arm grabbing, tossing, and legged robot crawling and walking.
This research aims to develop a full-sized self-driving bicycle based on deep reinforcement learning. The advantage of learning method is that it is model-free, saving time from parameter identification and tuning, and avoids the non-linearity in bicycle model. Besides, most reaction wheel bicycle projects use miniatures, while this research focuses on full-sized bicycle which is in a more difficult condition for balancing.
A bicycle is an underactuated system, so one must balance itself by controlling. This research regards the bicycle as an inverted pendulum and tries to control it with reaction torque created by a reaction wheel. This research applies Pytorch’s deep learning structure and designs a reaction wheel inverted pendulum environment referencing OpenAI Gym to execute the training in simulation, and compares results of different RL algorithms and hyperparameters. Last, we apply the best simulation result to a real-world bicycle, whose parameters are designed with the help of LQR simulation. The communication between the RL algorithm, IMU, and motor is based on ROS (Robot Operating System).
摘要 I
Abstract II
致謝 III
圖目錄 VII
表目錄 X
第一章 緒論 1
1.1 研究動機與目的 1
1.2 文獻回顧 2
1.3 論文架構 5
第二章 研究理論 6
2.1 動力學模型 6
2.1.1 角動量 6
2.1.2 角動量定理 8
2.1.3 反作用輪倒單擺模型(直觀推導) 8
2.1.4 反作用輪倒單擺模型(自由體圖) 11
2.1.5 反作用輪倒單擺模型(拉格朗日力學) 13
2.1.6 模型驗證 14
2.2 深度強化學習 16
2.2.1 策略梯度(Policy Gradient) 16
2.2.2 Q-Learning 17
2.2.3 Advantage Actor-Critic (A2C) 19
2.2.4 熵正則(Entropy Regularization) 20
2.2.5 Soft Actor-Critic (SAC) 21
2.3 Linear-Quadratic Regulator 24
2.4 通訊協定 25
2.4.1 Serial Peripheral Interface(SPI)25
2.4.2 Controller Area Network(CAN BUS)26
第三章 模擬 28
3.1 參數設計 28
3.2 強化學習模擬環境 32
3.2.1 演算法架構 32
3.2.2 環境參數 33
3.2.3 參數調變 34
3.3 模擬結果 37
第四章 實驗 44
4.1 硬體架構 44
4.1.1 慣性感測單元(IMU) 47
4.1.2 控制器 48
4.1.3 反作用輪馬達 50
4.2 實驗環境 51
4.3 實驗結果 52
4.3.1 模擬結果套用 52
4.3.2 實體訓練 54
第五章 結論與未來工作 58
5.1 結論 58
5.2 未來工作 59
參考文獻 60
附錄 64
[1] Tuomas Haarnoja, Sehoon Ha, Aurick Zhou, Jie Tan, George Tucker, Sergey Levine, “Learning to Walk via Deep Reinforcement Learning,” 於 RSS, 2018.
[2] Shirin Joshi, Sulabh Kumra, Ferat Sahin, “Robotic Grasping using Deep Reinforcement Learning,” 於 CASE, 2020.
[3] Zihan Fang; Yanxu Hou; Jun Li, “A Pick-and-Throw Method for Enhancing Robotic Sorting Ability via Deep Reinforcement Learning,” 於 YAC, 2021.
[4] Sangduck Lee, Woonchul Ham, “Self Stabilizing Strategy in Tracking Control of Unmanned Electric Bicycle with Mass Balance,” 於 IROS, 2002.
[5] 林子傑, 自我平衡電動自行車之感測融合與路徑追蹤, 國立清華大學碩士論文, 2021.
[6] Wenhao Deng, Skyler Moore, Jonathan Bush, Miles Mabey, Wenlong Zhang, “Towards Automated Bicycles: Achieving Self-Balance Using Steering Control,” 於 DSCC, 2018.
[7] Mark W. Spong, Peter Corke, Rogelio Lozano, “Nonlinear control of the Reaction Wheel Pendulum,” Automatica , 2001.
[8] Oscar Danilo Montoya, Walter Gil-González, “Nonlinear analysis and control of a reaction wheel pendulum: Lyapunov-based approach,” Engineering Science and Technology, an International Journal, 2019.
[9] Daulet Baimukashev; Nazerke Sandibay; Bexultan Rakhim; Huseyin Atakan Varol; Matteo Rubagotti, “Deep Learning-Based Approximate Optimal Control of a Reaction-Wheel-Actuated Spherical Inverted Pendulum,” 於 ASME, 2020.
[10] Yunki Kim, Hyunwoo Kim, Jangmyung Lee, “Stable control of the bicycle robot on a curved path by using a reaction wheel,” Journal of Mechanical Science and Technology, 2015.
[11] K. Kanjanawanishkul, “LQR and MPC controller design and comparison for a stationary self-balancing bicycle robot with a reaction wheel,” 於 Kybernetika, 2015.
[12] Ngoc Kien Vu, Hong Quang Nguyen, “Design Low-Order Robust Controller for Self-Balancing Two-Wheel Vehicle,” Mathematical Problems in Engineering, 2021.
[13] Vanessa Tan, John Leur Labrador, Marc Caesar Talampas, “MATA-RL: Continuous Reaction Wheel Attitude Control Using the MATA Simulation Software and Reinforcement Learning,” 於 SSC, 2021.
[14] A. Sharp, Bicycles & Tricycles: A Classic Treatise on Their Design and Construction, Dover Publications, 2011.
[15] F. J. W. Whipple, “The Stability of Motion of a Bicycle,” The Ouarterly Journal of Pure and Applied Mathematics, pp. 312-348, 1899.
[16] E. Carvallo, “Théorie du mouvement du monocycle et de la bicyclette,” J. de L’Ecole Polytechnique, 第 冊5, pp. 119-188, 1900.
[17] K.J. Åström, R.E. Klein, Anders Lennartsson, “Bicycle dynamics and control: Adapted bicycles for education and research,” IEEE Control Systems Magazine, 2005.
[18] J. P. Meijaard, Jim Papadopoulos, Andy Ruina, Arend L Schwab, “Linearized dynamics equations for the balance and steer of a bicycle: A benchmark and review,” Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences, pp. 1955-1982, 2007.
[19] 賴大渭, 自行車騎行動力學理論與安全性分析, 國立清華大學博士論文, 2009.
[20] 魯浩天, 無人自行車之自我平衡及直線追蹤控制, 國立清華大學碩士論文, 2018.
[21] M. Minsky, Theory of neural-analog reinforcement systems and its application to the brain-model problem, 1954.
[22] R. Bellman, A Markovian Decision Process, 1957.
[23] R. A. Howard, Dynamic Programming and Markov Processes, 1960.
[24] R. S. Sutton, “Learning to predict by the methods of temporal differences,” 於 Machine Learning 3, 1988, pp. 9-44.
[25] C. Watkins, Learning From Delayed Rewards, 1989.
[26] G. A. Rummery, Mahesan Niranjan, On-Line Q-Learning Using Connectionist Systems, 1994.
[27] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller, “Deterministic Policy Gradient Algorithms,” 於 ICML, 2014.
[28] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller, “Playing Atari with Deep Reinforcement Learning,” arXiv, 2013.
[29] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra, “Continuous control with deep reinforcement learning,” arXiv, 2015.
[30] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, “Proximal Policy Optimization Algorithms,” arXiv, 2017.
[31] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” 於 ICML, 2018.
[32] Scott Fujimoto, Herke van Hoof, David Meger, “Addressing Function Approximation Error in Actor-Critic Methods,” 於 ICML, 2018.
[33] Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine, “Soft Actor-Critic Algorithms and Applications,” arXiv, 2019.
[34] Gonzalo Belascuen, Nahuel Aguilar, “Design, Modeling and Control of a Reaction Wheel Balanced Inverted Pendulum,” 於 ARGENCON, 2018.
[35] Cheng-shuo Ying, Andy H.F. Chow, Kwai-Sang Chin, “An actor-critic deep reinforcement learning approach for metro train scheduling with rolling stock circulation under stochastic demand,” Transportation Research Part B: Methodological, 第 冊140, pp. 210-235, 2020.
[36] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” arXiv, 2018.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *