帳號:guest(3.138.172.183)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):楊宛蒨
作者(外文):Yang, Wan-Chien
論文名稱(中文):論目標條件模仿學習的泛化
論文名稱(外文):On Generalization of Goal-Conditioned Imitation Learning
指導教授(中文):金仲達
指導教授(外文):King, Chung-Ta
口試委員(中文):朱宏國
邱瀞德
口試委員(外文):Chu, Hung-Kuo
Chiu, Ching-Te
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:106062534
出版年(民國):108
畢業學年度:108
語文別:中文
論文頁數:29
中文關鍵詞:目標條件模仿學習泛化
外文關鍵詞:Goal-Conditioned Imitation LearningGeneralization
相關次數:
  • 推薦推薦:0
  • 點閱點閱:322
  • 評分評分:*****
  • 下載下載:13
  • 收藏收藏:0
目標條件式機器學習嘗試根據給定的目標條件生成具有相同輸入的不同輸出的機器模型。這對於機器人技術特別有用,在機器人技術中,機器人使用不同的參數執行相同的任務。當前的條件模型主要關注離散目標條件,這些目標條件主要用作分類標準。研究離散目標條件可以如何很好地概括為連續值是很有趣的。在本文中,我們使用模仿學習來訓練具有離散目標條件的神經模型,然後使用訓練後的目標條件之間的條件值來測試模型。我們在具有5DoF機械手臂的模擬環境中進行實驗,該機械手臂實驗三個簡單的機器人任務,這些任務使用機械手臂的一個或兩個關節。實驗結果表明,訓練有素的機器人代理可以對目標條件值進行插值以完成任務,達到與線性回歸相似的效果。但是,目標條件值的外推將在有限的值範圍內成功。
Goal-conditioned machine learning attempts to generate machine models that produce different outputs with the same inputs, depending on the given goal-conditions. It is especially useful for robotics, in which robots perform same tasks with different parameters. Current conditional models mainly focus on discrete goal-conditions that mainly serve as criteria for classification. It is interesting to investigate how well the discrete goal-conditions can be generalized to continuous values. In this thesis, we use imitation learning to train neural models with discrete goal-conditions and then test the models with condition values between the trained goal-conditions. We conduct experiments in a simulated environment with a 5DoF robotic arm with three simple robotic tasks that use one or two joints of the arm. The experimental results show that the trained robot agent can interpolate the condition values to complete the tasks, achieving an effect similar to linear regression. However, extrapolation of the condition values will success for a limited range of values.
Chinese Abstract................ⅰ
Abstract................ⅱ
Acknowledgement................ⅲ
Table of Contents................ⅳ
Chapter 1 Introduction................1
Chapter 2 Related Work................4
2.1 Imitation Learning................4
2.2 Conditional Model................5
2.3 Conditional Imitation Learning................6
Chapter 3 Method................8
3.1 Goal-Conditioned Imitation Learning................8
3.2 Network Detail................10
Chapter 4 Experiments................12
4.1 System Setup................12
4.2 Evaluation Method................15
4.3 Evaluation Result and Analysis................18
Chapter 5 Conclusions and Future work................24
Chapter 6 References................25
[1] P. Abbeel and A. Ng, “Apprenticeship learning via inverse reinforcement learning”, Internation Conference in Machine Learning, 2004.
[2] Y. Liu, A. Gupta, P. Abbeel, and S. Levine, “Imitation from observation: Learning to imitate behaviors from raw video via context translation”, Computing Research Repository(CoRR), 2017.
[3] YuXuan Liu, Abhishek Gupta, Pieter Abbeel, Sergey Levine, “Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation”, International Conference on Robotics and Automation, 2018
[4] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies”, J. Mach. Learn. Res., vol. 17, no. 1, 2016.
[5] A. Edwards, C. Isbell, and A. Takanishi, “Perceptual reward functions”, arXiv preprint arXiv: 1608.03824, 2016.
[6] A. Y. Ng and S. J. Russell, “Algorithms for inverse reinforcement learning”, Internation Conference in Machine Learning, 2000.
[7] B. C. Stadie, P. Abbeel, and I. Sutskever, “Third-person imitation learning”, International Conference on Learning Representations, 2017.
[8] J. Ho and S. Ermon, “Generative adversarial imitation learning”, Advances in Neural Information Processing Systems, 2016.
[9] P. Sermanet, K. Xu, and S. Levine, “Unsupervised perceptual rewards for imitation learning”, Robotics: Science and Systems, 2017.
[10] P. Sermanet, C. Lynch, J. Hsu, and S. Levine, “Time-contrastive networks: Self-supervised learning from multi-view observation”, arXiv preprint arXiv:1704.06888, 2017.
[11] Y. Duan, M. Andrychowicz, B. C. Stadie, J. Ho, J. Schneider, I. Sutskever, P. Abbeel, and W. Zaremba, “One-shot imitation learning”, Advances in Neural Information Processing Systems, 2017.
[12] Felipe Codevilla, Matthias Müller, Antonio López, Vladlen Koltun, Alexey Dosovitskiy, “End-to-end Driving via Conditional Imitation Learning”, International Conference on Robotics and Automation, 2018
[13] Mehdi Mirza, Simon Osindero, “Conditional Generative Adversarial Nets”, arXiv preprint arXiv:1411.1784, 2014
[14] Pierre Sermanet, Kelvin Xu, Sergey Levine, Unsupervised Perceptual Rewards for Imitation Learning, arXiv preprint arXiv:1612.06699, 2017
[15] Yuke Zhu, Ziyu Wang, Josh Merel, Andrei Rusu, Tom Erez, Serkan Cabi, Saran Tunyasuvunakool, János Kramár, Raia Hadsell, Nando de Freitas, Nicolas Heess, “Reinforcement and Imitation Learning for Diverse Visuomotor Skills”, Robotics: Science and Systems, 2018
[16] Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, Sergey Levine, “One Shot Visual Imitation Learning via Meta-Learning”, Conference on Robot Learning, 2017
[17] Tianhao Zhang, Zoe McCarthy, Owen Jow, Dennis Lee, Xi Chen, Ken Goldberg, Pieter Abbeel, “Deep imitation learning for complex manipulation tasks from virtual reality teleoperation”, International Conference on Robotics and Automation, 2018
[18] B. C. Stadie, P. Abbeel, and I. Sutskever, “Third-person imitation learning”, International Conference on Learning Representations, 2017.
[19] Stephen James, Andrew J. Davison, Edward Johns, “Transferring End-to-End Visuomotor Control from Simulation to RealWorld for a Multi-Stage Task”, Conference on Robot Learning, 2017
[20] N. D. Ratliff, J. A. Bagnell, and M. Zinkevich, “Maximum margin planning”, International Conference in Machine Learning, 2006.
[21] D. Ramachandran and E. Amir, “Bayesian inverse reinforcement learning,” in Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007
[22] He, K., Zhang, X., Ren, S., Sun, J.,”Deep residual learning for image recognition”, Computer Vision and Pattern Recognition, 2016
[23] Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L, “Densely connected convolutional networks”, arXiv preprint arXiv:1608.06993, 2016
[24] Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al., “Google’s neural machine translation system: Bridging the gap between human and machine translation”, arXiv preprint arXiv:1609.08144, 2016
[25] Kumar, A., Irsoy, O., Ondruska, P., Iyyer, M., Bradbury, J., Gulrajani, I., Zhong, V., Paulus, R., Socher, R., “Ask me anything: Dynamic memory networks for natural language processing”, The Central Iowa Metro League, 2016
[26] Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case,C., Casper, J., Catanzaro, B., Cheng, Q., Chen, G., et al., “Deep speech 2: End-toend speech recognition in english and mandarin”, The Central Iowa Metro League (CIML), 2016
[27] Oord, A.v.d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., Kavukcuoglu, K., “Wavenet: A generative model for raw audio”, arXiv preprint arXiv:1609.03499, 2016
[28] Diederik P. Kingma, Jimmy Ba, Adam, “A Method for Stochastic Optimization”, International Conference on Learning Representations, 2015
[29] Brian D. Ziebart, Andrew Maas, J.Andrew Bagnell, and Anind K. Dey, “Maximum entropy inverse reinforcement learning”, Association for the Advancement of Artificial Intelligence (AAAI), 2008
[30] Faraz Torabi1, Garrett Warnell2, Peter Stone1, “Behavioral Cloning from Observation”, International Joint Conference on Artificial Intelligence, 2018
[31] Bain and Sommut, Michael Bain and Claude Sommut, “A framework for behavioural claning”, Machine Intelligence 15, 15:103, 1999
[32] Grigory Antipov,Moez Baccouche, at al., “Face aging with conditional generative adversarial networks”, International Conference on Image Processing, 2017
[33] Fang Zhao, Jian Zhao,Shuicheng Yan, and Jiashi Feng, “Dynamic Conditional Networks for Few-Shot Learning”, European Conference on Computer Vision (ECCV), 2018
[34] Grigory Antipov, Moez Baccouche, Jean-Luc Dugelay, “Face Aging with Conditional Generative Adversarial Networks”, International Conference on Image Processing, 2017
[35] Jingkuan Song, Jingqiu Zhang, Lianli Gao, Xianglong Liu, Heng Tao Shen, “Dual Conditional GANs for Face Aging and Rejuvenation”, International Joint Conference on Artificial Intelligence(IJCAI), 2018
[36] Anastasia Borovykh, Sander Bohte, Cornelis W. Oosterlee, “Conditional time series forecasting with convolutional neural networks”, arXiv preprint arXiv:1703.04691v1, 2018
[37] Yiming Ding, Carlos Florensa, Mariano Phielipp, Pieter Abbeel,“Goal-conditioned Imitation Learning”, International Conference in Machine Learning, 2019
[38] Kuan Fang, Yunfei Bai, Stefan Hinterstoisser, Silvio Savarese, Mrinal Kalakrishnan,“Multi-Task Domain Adaptation for Deep Learning of Instance Grasping from Simulation”, International Conference on Robotics and Automation, 2018
[39] J. R. Cook and L. A. Stefanski ,“Simulation-Extrapolation Estimation in Parametric Measurement Error Models”, Journal of the American Statistical Association Vol. 89, No. 428, pp. 1314-1328, 1995
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *