論文名稱(外文):KeyState:Improving Image-based Reinforcement Learning with Keypoint Information for Robot Control
指導教授(外文):King, Chung-Ta
口試委員(外文):Jiang, Jehn-Ruey
Hsu, Chiou-Ting
外文關鍵詞:Reinforcement LearningRobot controlImage-based Reinforcement Learning
從高維度畫面中學習是強化學習(RL)很重要的問題,因為它可以直接透過他人與環境互動的視覺畫面來學習。目前方法大多是透過從畫面中萃取與任務相關的資訊,並用其學習一個描述系統狀態的表示法。雖然現有方法可以有效地找到好的靜態資訊,它仍然需要很多時間才能學會系統中的動態資訊,然而動態資訊在機器人控制等應用卻是至關重要的。對於機器人控制任務來說,最重要的系統動態資訊莫過於機器人本身的動作,若機器人的狀態是已知的,基於圖像的強化學習就可以更有效率。在本篇論文中,我們提議提取機器人的關鍵點作為輔助資訊,來提昇強化學習從畫面學習的效率。 在這篇論文中,我們採用DeepMind Control Suite 來評估我們的方法,其是評估強化學習數據效率和性能的常見基準。實驗結果顯示,我們方法的效能平均較先前基於畫面的強化學習演算法高1.65倍。
Learning from high-dimensional images is essential for reinforcement learning (RL) to train autonomous agents that can interact directly with the environment using visual observations. A general strategy is to extract task-relevant information in the images to learn a representation that characterizes the system states. Although existing works can find good representations for static states efficiently, they still take a long time to learn system dynamics, which is critical for applications such as robot control. For robot control, since the most important system dynamics is the motions of the robot itself, image-based RL can be more efficient if the state of the robot is known. In this thesis, we propose to extract the keypoints of the robot as auxiliary information to improve the data-efficiency of RL from image pixels. The proposed method, called KeyState, is evaluated on DeepMind Control Suite, a common benchmark for evaluating data-efficiency and performance of RL agents. The experimental results show that the performance of KeyState is, on average, 1.65 times better than the prior pixel-based methods.
摘要 i
Abstract ii
1 Introduction 1
2 Related Work 5
3 Method 7
3.1 Pre-trained Keypoint Discovery Model . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Keypoint-based Auxiliary Information . . . . . . . . . . . . . . . . . . . . . . 9
4 Experiments 11
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Experimental Results in Rich Reward Environments . . . . . . . . . . . . . . 13
4.3 Experimental Results in Sparse Reward Environments . . . . . . . . . . . . . 13
4.4 Experimental Results in Complex Environments . . . . . . . . . . . . . . . . . 14
4.5 Correlation between Physical States and
Keypoint-based Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.6 Physical States Ablations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.7 The Necessity of Approximating Angles with Keypoints . . . . . . . . . . . . 18
5 Conclusion and Future Work 21
References 23
* *