針對機器人控制以關鍵點資訊改善基於畫面的強化學習__國立清華大學博碩士論文全文影像系統

帳號：guest(18.191.181.20) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	陳令臻
作者(外文):	CHEN, LING-CHEN
論文名稱(中文):	針對機器人控制以關鍵點資訊改善基於畫面的強化學習
論文名稱(外文):	KeyState:Improving Image-based Reinforcement Learning with Keypoint Information for Robot Control
指導教授(中文):	金仲達
指導教授(外文):	King, Chung-Ta
口試委員(中文):	江振瑞許秋婷
口試委員(外文):	Jiang, Jehn-Ruey Hsu, Chiou-Ting
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	109062704
出版年(民國):	111
畢業學年度:	111
語文別:	英文
論文頁數:	25
中文關鍵詞:	強化學習、機器人控制、基於畫面的強化學習
外文關鍵詞:	Reinforcement Learning、Robot control、Image-based Reinforcement Learning
相關次數:	推薦:0 點閱:404 評分: 下載:0 收藏:0

從高維度畫面中學習是強化學習(RL)很重要的問題，因為它可以直接透過他人與環境互動的視覺畫面來學習。目前方法大多是透過從畫面中萃取與任務相關的資訊，並用其學習一個描述系統狀態的表示法。雖然現有方法可以有效地找到好的靜態資訊，它仍然需要很多時間才能學會系統中的動態資訊，然而動態資訊在機器人控制等應用卻是至關重要的。對於機器人控制任務來說，最重要的系統動態資訊莫過於機器人本身的動作，若機器人的狀態是已知的，基於圖像的強化學習就可以更有效率。在本篇論文中，我們提議提取機器人的關鍵點作為輔助資訊，來提昇強化學習從畫面學習的效率。在這篇論文中，我們採用DeepMind Control Suite 來評估我們的方法，其是評估強化學習數據效率和性能的常見基準。實驗結果顯示，我們方法的效能平均較先前基於畫面的強化學習演算法高1.65倍。

Learning from high-dimensional images is essential for reinforcement learning (RL) to train autonomous agents that can interact directly with the environment using visual observations. A general strategy is to extract task-relevant information in the images to learn a representation that characterizes the system states. Although existing works can find good representations for static states efficiently, they still take a long time to learn system dynamics, which is critical for applications such as robot control. For robot control, since the most important system dynamics is the motions of the robot itself, image-based RL can be more efficient if the state of the robot is known. In this thesis, we propose to extract the keypoints of the robot as auxiliary information to improve the data-efficiency of RL from image pixels. The proposed method, called KeyState, is evaluated on DeepMind Control Suite, a common benchmark for evaluating data-efficiency and performance of RL agents. The experimental results show that the performance of KeyState is, on average, 1.65 times better than the prior pixel-based methods.

Acknowledgements
摘要 i
Abstract ii
1 Introduction 1
2 Related Work 5
3 Method 7
3.1 Pre-trained Keypoint Discovery Model . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Keypoint-based Auxiliary Information . . . . . . . . . . . . . . . . . . . . . . 9
4 Experiments 11
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Experimental Results in Rich Reward Environments . . . . . . . . . . . . . . 13
4.3 Experimental Results in Sparse Reward Environments . . . . . . . . . . . . . 13
4.4 Experimental Results in Complex Environments . . . . . . . . . . . . . . . . . 14
4.5 Correlation between Physical States and
Keypoint-based Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.6 Physical States Ablations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.7 The Necessity of Approximating Angles with Keypoints . . . . . . . . . . . . 18
5 Conclusion and Future Work 21
References 23

[1] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, “Building machines that
learn and think like people,” Behavioral and brain sciences, vol. 40, 2017.
[2] P. Tsividis, T. Pouncy, J. L. Xu, J. B. Tenenbaum, and S. J. Gershman, “Human learning
in atari,” in Proceedings of the AAAI Spring Symposia, 2017.
[3] L. Kaiser, M. Babaeizadeh, P. Milos, B. Osinski, R. H. Campbell, K. Czechowski, D. Erhan, C. Finn, P. Kozakowski, S. Levine, et al., “Model-based reinforcemenst learning for
atari,” arXiv preprint arXiv:1903.00374, 2019.
[4] D. Yarats, A. Zhang, I. Kostrikov, B. Amos, J. Pineau, and R. Fergus, “Improving sample
efficiency in model-free reinforcement learning from images,” in Proceedings of the AAAI
Conference on Artificial Intelligence, vol. 35, pp. 10674–10681, 2021.
[5] D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Learning
latent dynamics for planning from pixels,” in Proceedings of the International conference
on machine learning, pp. 2555–2565, PMLR, 2019.
[6] D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, “Dream to control: Learning behaviors by
latent imagination,” in Proceedings of the International Conference on Learning Representations, 2020.
[7] A. X. Lee, A. Nagabandi, P. Abbeel, and S. Levine, “Stochastic latent actor-critic: Deep
reinforcement learning with a latent variable model,” Advances in Neural Information
Processing Systems, vol. 33, pp. 741–752, 2020.
[8] M. Laskin, A. Srinivas, and P. Abbeel, “Curl: Contrastive unsupervised representations
for reinforcement learning,” Proceedings of the 37th International Conference on Machine
Learning, Vienna, Austria, PMLR 119, 2020. arXiv:2004.04136.
[9] M. Laskin, K. Lee, A. Stooke, L. Pinto, P. Abbeel, and A. Srinivas, “Reinforcement learning with augmented data,” Advances in neural information processing systems, vol. 33,
pp. 19884–19895, 2020.
[10] I. Kostrikov, D. Yarats, and R. Fergus, “Image augmentation is all you need: Regularizing
deep reinforcement learning from pixels,” arXiv preprint arXiv:2004.13649, 2020.
[11] Y. Tassa, Y. Doron, A. Muldal, T. Erez, Y. Li, D. d. L. Casas, D. Budden, A. Abdolmaleki,
J. Merel, A. Lefrancq, et al., “Deepmind control suite,” arXiv preprint arXiv:1801.00690, 2018.
[12] E. Shelhamer, P. Mahmoudieh, M. Argus, and T. Darrell, “Loss is its own reward: Selfsupervision for reinforcement learning,” arXiv preprint arXiv:1612.07307, 2016
[13] T. Jakab, A. Gupta, H. Bilen, and A. Vedaldi, “Unsupervised learning of object landmarks
through conditional image generation,” Advances in neural information processing systems, vol. 31, 2018.
[14] Y. Zhang, Y. Guo, Y. Jin, Y. Luo, Z. He, and H. Lee, “Unsupervised discovery of object landmarks as structural representations,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 2694–2703, 2018.
[15] D. Lorenz, L. Bereska, T. Milbich, and B. Ommer, “Unsupervised part-based disentangling of object shape and appearance,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pp. 10955–10964, 2019.
[16] T. Jakab, A. Gupta, H. Bilen, and A. Vedaldi, “Self-supervised learning of interpretable
keypoints from unlabelled videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8787–8797, 2020.
[17] J. J. Sun, S. Ryou, R. H. Goldshmid, B. Weissbourd, J. O. Dabiri, D. J. Anderson,
A. Kennedy, Y. Yue, and P. Perona, “Self-supervised keypoint discovery in behavioral
videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 2171–2180, 2022.
[18] M. H. Graham, “Confronting multicollinearity in ecological multiple regression,” Ecology, vol. 84, no. 11, pp. 2809–2815, 2003.
[19] R. Jonschkowski, R. Hafner, J. Scholz, and M. Riedmiller, “Pves: Position-velocity
encoders for unsupervised learning of structured state representations,” arXiv preprint
arXiv:1705.09805, 2017.
[20] W. Shang, X. Wang, A. Srinivas, A. Rajeswaran, Y. Gao, P. Abbeel, and M. Laskin,
“Reinforcement learning with latent flow,” Advances in Neural Information Processing
Systems, vol. 34, pp. 22171–22183, 2021.
[21] R. Boney, A. Ilin, and J. Kannala, “End-to-end learning of keypoint representations for
continuous control from images,” arXiv preprint arXiv:2106.07995, 2021.
[22] T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu,
A. Gupta, P. Abbeel, et al., “Soft actor-critic algorithms and applications,” arXiv preprint
arXiv:1812.05905, 2018.
[23] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves,
M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Human-level control through deep
reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
[24] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
[25] M. Hessel, J. Modayil, H. Van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan,
B. Piot, M. Azar, and D. Silver, “Rainbow: Combining improvements in deep reinforcement learning,” in Proceedings of the Thirty-second AAAI conference on artificial intelligence, 2018.
[26] A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive
coding,” arXiv preprint arXiv:1807.03748, 2018.
[27] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised
visual representation learning,” in Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition, pp. 9729–9738, 2020.
[28] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive
learning of visual representations,” in Proceedings of the International conference on machine learning, pp. 1597–1607, PMLR, 2020.
[29] T. D. Kulkarni, A. Gupta, C. Ionescu, S. Borgeaud, M. Reynolds, A. Zisserman, and
V. Mnih, “Unsupervised learning of object keypoints for perception and control,” Advances in neural information processing systems, vol. 32, 2019.
[30] B. Chen, P. Abbeel, and D. Pathak, “Unsupervised learning of visual 3d keypoints for
control,” in Proceedings of the International Conference on Machine Learning, pp. 1539–
1549, PMLR, 2021.
[31] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from
error visibility to structural similarity,” IEEE transactions on image processing, vol. 13,
no. 4, pp. 600–612, 2004.

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文