針對機器人控制通過學習空間相關性改善基於圖像的強化學習的關鍵點分佈_

帳號：guest(216.73.216.49) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	柯維碩
作者(外文):	Ko, Wei-Shuo
論文名稱(中文):	針對機器人控制通過學習空間相關性改善基於圖像的強化學習的關鍵點分佈
論文名稱(外文):	Improving Keypoint Distribution of Image-based Reinforcement Learning for Robot Control by Learning Spatial Correlation
指導教授(中文):	金仲達
指導教授(外文):	King, Chung-Ta
口試委員(中文):	莊仁輝朱宏國
口試委員(外文):	Chuang, Jen-Hui Chu, Hung-Kuo
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	110062656
出版年(民國):	112
畢業學年度:	112
語文別:	英文
論文頁數:	26
中文關鍵詞:	強化學習、關鍵點、機器人控制
外文關鍵詞:	reinforcement learning、keypoint、robot control
相關次數:	推薦:0 點閱:150 評分: 下載:0 收藏:0

使用高維度圖像作為強化學習（RL）的輸入來處理機器人控制等連續任務已成為一個重要的研究主題。透過從高維影像中自動提取狀態信息，此類技術允許在不需要提供額外狀態資訊的情況下訓練代理。問題是他們經常遇到樣本效率低下的問題。對於機器人控制，緩解該問題的一種策略是首先從高維影像中提取機器人的關鍵點，而不是直接從原始像素中學習。關鍵點是輸入影像中機器人關鍵部位的座標，代表機器人的具體狀態資訊。儘管最先進的基於關鍵點的強化學習方法（例如 FPAC）在某些情況下提供了良好的樣本效率，但在其他情況下卻表現不佳。仔細檢查發現，他們忽略的一個領域是關鍵點之間的空間關係。本文採用基於關鍵點之間空間關係的輔助損失函數來提高基於關鍵點的強化學習方法的樣本效率。此外，研究表明，需要一種特殊的輸入影像分割方法來補充引入的損失函數，以獲得更均勻的性能改進。我們所提出的方法在 DeepMind Control Suite 上進行了評估。與FPAC相比，所提出的方法平均可獲得1.77倍的效能提升。

Using high-dimensional images as input for reinforcement learning (RL) to handle continuous tasks such as robot control has become an important research topic. By extracting state information automatically from high dimensional images, such techniques allow an agent to be trained without the need to supply extra state information. The problem is that they often suffer from poor sample efficiency. For robot control, one strategy to alleviate the problem is to extract the keypoints of the robot from the high dimensional images first, instead of learning from the raw pixels directly. Keypoints are coordinates of the key parts of the robot in the input image, representing specific state information of the robot. Although state-of-the-art keypoint-based RL methods, such as FPAC, provide good sample efficiency in certain cases, they fall short in other cases. A closer examination shows that one area where they overlooked is the spatial relationship between the keypoints. In this thesis, an auxiliary loss based on the spatial relationship between keypoints is adopted to improve the sample efficiency of keypoint-based RL methods. Furthermore, it is shown that a special way of segmenting the input images is needed to complement the introduced loss in order to gain more uniform performance improvements. The proposed method is evaluated on DeepMind Control Suite. Compared with FPAC, the proposed method can achieve on average 1.77 times better performance.

Abstract (Chinese) I
Abstract II
Contents III
List of Figures V
List of Tables VI
1 Introduction 1
2 Related Work 5
2.1 Reinforcement Learning from Pixels . . . . . . . . . . . . . . . . . . 5
2.2 Keypoint Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Method 8
3.1 Feature Point Actor Critic (FPAC) . . . . . . . . . . . . . . . . . . 9
3.2 FPAC with Contiguity Loss . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Our Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Experiments 14
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Overall Performance Comparison . . . . . . . . . . . . . . . . . . . 16
4.3 Effects of Image Segmentation . . . . . . . . . . . . . . . . . . . . . 16
4.4 Effects of Segmentation Size . . . . . . . . . . . . . . . . . . . . . . 19
4.5 Effects on Feature Point Distribution . . . . . . . . . . . . . . . . . 19
5 Conclusion and Future Work 22
Bibliography 24

[1] R. Boney, A. Ilin, and J. Kannala, “Learning of feature points without additional supervision improves reinforcement learning from images,” arXiv preprint arXiv:2106.07995, 2021.
[2] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International Conference on Machine Learning, pp. 1861–1870, PMLR, 2018.
[3] D. Yarats, A. Zhang, I. Kostrikov, B. Amos, J. Pineau, and R. Fergus, “Improving sample efficiency in model-free reinforcement learning from images,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 10674–10681, 2021.
[4] M. Laskin, A. Srinivas, and P. Abbeel, “Curl: Contrastive unsupervised representations for reinforcement learning,” in International Conference on Machine Learning, pp. 5639–5650, PMLR, 2020.
[5] I. Kostrikov, D. Yarats, and R. Fergus, “Image augmentation is all you need: Regularizing deep reinforcement learning from pixels,” arXiv preprint arXiv:2004.13649, 2020.
[6] T. D. Kulkarni, A. Gupta, C. Ionescu, S. Borgeaud, M. Reynolds, A. Zisserman, and V. Mnih, “Unsupervised learning of object keypoints for perception and control,” Advances in Neural Information Processing Systems, vol. 32, 2019.
[7] T. Anciukeviˇcius, P. Henderson, and H. Bilen, “Learning to predict keypoints and structure of articulated objects without supervision,” in 2022 26th International Conference on Pattern Recognition (ICPR), pp. 3383–3390, IEEE, 2022.
[8] K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” arXiv preprint arXiv:1511.08458, 2015.
[9] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
[10] Y. Tassa, Y. Doron, A. Muldal, T. Erez, Y. Li, D. d. L. Casas, D. Budden, A. Abdolmaleki, J. Merel, A. Lefrancq, et al., “Deepmind control suite,” arXiv preprint arXiv:1801.00690, 2018.
[11] M. Laskin, K. Lee, A. Stooke, L. Pinto, P. Abbeel, and A. Srinivas, “Reinforcement learning with augmented data,” Advances in Neural Information Processing Systems, vol. 33, pp. 19884–19895, 2020.
[12] M. Schwarzer, A. Anand, R. Goel, R. D. Hjelm, A. Courville, and P. Bachman, “Data-efficient reinforcement learning with self-predictive representations,” arXiv preprint arXiv:2007.05929, 2020.
[13] M. Schwarzer, N. Rajkumar, M. Noukhovitch, A. Anand, L. Charlin, R. D. Hjelm, P. Bachman, and A. C. Courville, “Pretraining representations for data-efficient reinforcement learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 12686–12699, 2021.
14] T. Jakab, A. Gupta, H. Bilen, and A. Vedaldi, “Unsupervised learning of object landmarks through conditional image generation,” Advances in Neural Information Processing Systems, vol. 31, 2018.
[15] Y. Zhang, Y. Guo, Y. Jin, Y. Luo, Z. He, and H. Lee, “Unsupervised discovery of object landmarks as structural representations,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2694– 2703, 2018.

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文