透過目標限定的任務編碼改善元模仿學習__國立清華大學博碩士論文全文影像系統

帳號：guest(13.59.172.7) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	林煜峰
作者(外文):	Lin, Yu-Fong
論文名稱(中文):	透過目標限定的任務編碼改善元模仿學習
論文名稱(外文):	Improving Meta-Imitation Learning with Focused Task Embedding
指導教授(中文):	金仲達
指導教授(外文):	King, Chung-Ta
口試委員(中文):	江振瑞朱宏國
口試委員(外文):	Jiang, Jehn-Ruey Chu, Hung-Kuo
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	108062511
出版年(民國):	111
畢業學年度:	110
語文別:	英文
論文頁數:	24
中文關鍵詞:	元學習、模仿學習、單樣本學習、機器人控制
外文關鍵詞:	Meta-Learning、Imitation Learning、One Shot Learning、Robot Control
相關次數:	推薦:0 點閱:187 評分: 下載:0 收藏:0

Meta-imitation learning 已經被應用於使機器人能夠快速概括他們學習的
任務以執行新的任務，其基本的理念便是將任務編碼成一段有意義的嵌
入向量，並執行給定嵌入向量所指定的任務，當機器人被要求執行一個沒
看過的新任務時，我們會給予一筆或少數筆示範，機器人會從這些示範
中，通過現有的嵌入進行概括泛化、以獲得新任務的嵌入向量。因此任務
編碼是Meta-imitation learning 學習的關鍵，但問題是大多數Meta-imitation
learning 方法直接編碼整個任務，這將使的任務的嵌入向量中包含太多無關
的細節，導致較差的泛化能力。在我們的研究中，我們提出根據任務的不
同特徵，如所需的技能和目標物件，分別對任務進行編碼，可以使機器人
可以更清楚地理解任務、並更好地執行新的任務。我們將提出的方法與一
般Meta-imitation learning 的方法在執行一些機器人任務時進行比較，並測
試他們對於沒看過的新任務的適應性能，實驗結果顯示，我們的方法能夠
更好地將任務編碼成有意義的嵌入向量，從而獲得更好的泛化效果。

Meta-imitation learning has been applied to enable robots to quickly generalize the tasks that they learned to perform new tasks. The basic idea is to encode tasks into meaningful embeddings and perform the task specified by the given embedding. When the robot is tasked to perform a new, unseen task, it is given a single or just a few demonstrations, from which a new task embedding is obtained by generalizing from existing embeddings. Therefore, task encoding is key to the generalizability of meta-imitation learning. The problem is that most meta-imitation learning methods directly encode the whole task. This will take in too many irrelevant details into the task embeddings, resulting in poor generalization abilities. In this work, we propose to encode a task separately based on its different features, e.g., the required skills and the target objects. As a result, the robot can have a clearer understanding of the tasks and perform new tasks better. We compare the proposed method with typical meta-imitation learning methods on performing a set of robot tasks and test their performances of adapting to new, unseen tasks. The experimental results indicate that the proposed method can better encode tasks into meaningful embeddings and thus lead to better generalization.

Acknowledgements
摘要i
Abstract ii
1 Introduction 1
2 Related Work 5
3 Preliminaries 7
4 Method 9
4.1 Object Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Skill Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Control Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 Task Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5 Experiments 13
5.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Learning Object Domain in Multi-skill Setting . . . . . . . . . . . . . . . . . . 14
5.2.1 Robot interacts with one object . . . . . . . . . . . . . . . . . . . . . . 14
5.2.2 Robot interacts with multiple objects . . . . . . . . . . . . . . . . . . 16
5.3 Learning Skill Domain in Multi-object Setting . . . . . . . . . . . . . . . . . . 18
6 Conclusions 21
References 23
A Experimental Details 25
A.1 Learn adapting object domain in a multi-skill setting . . . . . . . . . . . . . . 26
A.2 Learn adapting skill domain in a multi-object setting . . . . . . . . . . . . . . 26

[1] B. D. Argall, S. Chernova, M. Veloso and B. Browning., "A survey of robot learning from demonstration," Robotics and Autonomous Sustems, 2009.
[2] Z. M. O. J. Tianhao Zhang, D. Lee, X. Chen, K. Goldberg and P. Abbeel, "Deep imitation learning for complex manipulation," International Conference on Robotics and Automation, 2018.
[3] A. Y. Ng and S. Russell, "Algorithms for inverse reinforcement," International Conference on Machine Learning, 2000.
[4] P. Abbeel and A. Y. Ng, "Apprenticeship Learning via Inverse Reinforcement Learning," International Conference on Machine Learning, 2004.
[5] A. Gupta, V. Kumar, C. Lynch, S. Levine and K. Hausman, "Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning," Conference on Robot Learning, 2019.
[6] M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao and K. Zieba, "End to End Learning for Self-Driving Cars," arXiv:1604.07316, 2016.
[7] J. Merel, Y. Tassa, D. TB, S. Srinivasan, J. Lemmon, Z. Wang, G. Wayne and N. Heess, "Learning human behaviors from motion capture by adversarial imitation," arXiv:1707.02201, 2017.
[8] Y. Zhu, Z. Wang, J. Merel, A. Rusu, T. Erez, S. Cabi, S. Tunyasuvunakool, J. Kramár, R. Hadsell, N. d. Freitas and N. Heess, "Reinforcement and Imitation Learning for Diverse Visuomotor Skills," arXiv:1802.09564, 2018.
[9] K. Hausman, Y. Chebotar, S. Schaal, G. Sukhatme and J. Lim, "Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets," NIPS, 2017.
[10] R. Rahmatizadeh, P. Abolghasemi, L. Bölöni and S. Levine, "Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration," International Conference on Robotics and Automation, pp. 3758-3765, 2018.
[11] C. Lynch, M. Khansari, T. Xiao, V. Kumar, J. Tompson, S. Levine and P. Sermanet, "Learning Latent Plans from Play," Conference on Robot Learning, 2019.
[12] C. Devin, A. Gupta, T. Darrell, P. Abbeel and S. Levine, "Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer," International Conference on Robotics and Automation, 2017.
[13] C. Devin, P. Abbeel, T. Darrell and S. Levine, "Deep Object-Centric Representations for Generalizable Robot Learning," International Conference on Robotics and Automation, 2018.
[14] C. Finn, T. Yu, T. Zhang, P. Abbeel and S. Levine, "One-Shot Visual Imitation Learning via Meta-Learning," Conference on Robot Learning, 2017.
[15] S. James, M. Bloesch and A. J. Davison, "Task-Embedded Control Networks for Few-Shot Imitation Learning," Conference on Robot Learning, 2018.
[16] A. Singh, E. Jang, A. Irpan, D. Kappler, M. Dalal, S. Levine, M. Khansari and C. Finn, "Scalable Multi-Task Imitation Learning with Autonomous Improvement," International Conference on Robotics and Automation, p. 2167–2173, 2020.
[17] Y. Duan, M. Andrychowicz, B. C. Stadie, J. Ho, J. Schneider, I. Sutskever, P. Abbeel and W. Zaremba, "One-Shot Imitation Learning," NIPS, pp. 1087-1098, 2017.
[18] Z. Wang, J. Merel, S. Reed, G. Wayne, N. d. Freitas and N. Heess, "Robust Imitation of Diverse Behaviors," NIPS, 2017.
[19] S. Levine, C. Finn, T. Darrell and P. Abbeel, "End-to-End Training of Deep Visuomotor Policies," Journal of Machine Learning Research, 2016.
[20] L. Yen-Chen, M. Bauza and P. Isola, "Experience-Embedded Visual Foresight," Conference on Robot Learning, 2019.
[21] O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu and D. Wierstra, "Matching networks for one shot," Advances in neural information processing systems, p. 3630–3638, 2016.
[22] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu and A. C. Berg, "SSD: Single Shot MultiBox Detector," Proceedings of the European Conference on Computer Vision, pp. 21-37, 2016.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文