帳號:guest(3.145.152.223)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳昱名
作者(外文):Chen, Yu-Ming
論文名稱(中文):建構協同巨集行動策略集於強化學習代理
論文名稱(外文):Compose Synergistic Macro Actions for Reinforcement Learning Agents
指導教授(中文):李濬屹
指導教授(外文):Lee, Chun-Yi
口試委員(中文):黃稚存
李正匡
口試委員(外文):Huang, Chin-Tsun
Lee, Cheng-Kuang
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:107062574
出版年(民國):110
畢業學年度:109
語文別:英文
論文頁數:22
中文關鍵詞:協同作用巨集行動策略集強化學習
外文關鍵詞:SynergismMacro Action EnsembleReinforcement Learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:387
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
巨集行動策略已被證明有利於代理的學習過程,並且各種技術已被開發用於構建更有效的巨集行動策略。然而,過去的方法通常無法賦予巨集行動策略集協同作用。協同作用意指代理於評估期間運用不同巨集行動策略可取得更好的表現,相較於僅使用單個巨集行動策略。啟發於神經網路架構搜尋之前沿發展,我們將協同巨集行動策略集的構建過程系統化為順序決策問題,且在任務中評估所構建的整體巨集行動策略集。這樣的方式使協同作用於評估程序中可以被納入考量。根據實驗結果,我們所提出的框架能夠發現具備協同作用的巨集行動策略集,並且我們通過分析凸顯協同作用的好處。
Macro actions have been demonstrated to be beneficial for the learning processes of an agent, and have encouraged a variety of techniques to be developed for constructing more effective ones. However, previous techniques usually fail to provide an approach for combining macro actions to form a synergistic macro action ensemble, in which synergism exhibits when the constituent macro actions are favorable to be jointly used by an agent during evaluation. Such a synergistic macro action ensemble may potentially allow an agent to perform even better than the individual macro actions within it. Motivated by the recent advances of neural architecture search, in this thesis, we formulate the construction of a synergistic macro action ensemble as a sequential decision problem, and evaluate the constructed macro action ensemble in a task as a whole. Such a problem formulation enables synergism to be taken into account by the proposed evaluation procedure. Our experiments show that the proposed framework is able to discover synergistic macro action ensembles, and highlight the benefits of these ensembles through a set of analytical cases.
1 Introduction . . . 1
2 Background Material . . . 3
2.1 Markov Decision Process and Reinforcement Learning . . . 3
2.2 Deep Q-network and Proximal Policy Optimization . . . 3
2.3 Macro Action and Macro Action Ensemble . . . 4
3 Related Work . . . 5
4 Methodology . . . 7
4.1 Formulation of the Macro Ensemble Construction Process . . . 8
4.2 Construction Phase . . . 9
4.3 Evaluation Phase . . . 10
5 Experimental Results . . . 12
5.1 Experimental Setup . . . 12
5.2 Motivational Case of the Synergism Property . . . 14
5.3 Comparison of Our Method and the IEB Baseline . . . 15
5.4 Analysis of the Synergism Property . . . 15
6 Conclusions . . . 19
Bibliography . . . 20
[1] A. McGovern, R. S. Sutton, and A. H. Fagg. Roles of macro-actions in accel-erating reinforcement learning. In Proc. Grace Hopper celebration of women in computing, volume 1317, 1997.
[2] A. McGovern and R. S. Sutton. Macro-actions in reinforcement learning: An empirical analysis. Computer Science Department Faculty Publication Series, page 15, 1998.
[3] S. Xu, H. Y. Kuang, Z. Q. Zhuang, R. J. Hu, Y. Liu, and H. Y. Sun. Macro action selection with deep reinforcement learning in StarCraft, 12 2018.
[4] H. Onda and S. Ozawa. A reinforcement learning model using macro-actions in multi-task grid-world problems. In Proc. Int. Conf. Systems, Man and Cybernetics (SMC), pages 3088–3093, Oct. 2009.
[5] A. Braylan, M. Hollenbeck, E. Meyerson, and R. Miikkulainen. Frame skip is a powerful parameter for learning to play Atari. In Proc. Association for the Advancement of Artificial Intelligence (AAAI) Conf. Workshop, Jan. 2015.
[6] T. Yoshikawa and M. Kurihara. An acquiring method of macro-actions in reinforcement learning. In Proc. IEEE Int. Conf. Systems, Man, and Cyber-netics (SMC), pages 4813–4817, Nov. 2006.
[7] M. A. H. Newton, J. Levine, M. Fox, and D. Long. Learning macro-actions for arbitrary planners and domains. In Proc. Int. Conf. Automated Planning and Scheduling (ICAPS), pages 256–263, Sep. 2007.
[8] I. P. Durugkar, C. Rosenbaum, S. Dernbach, and S. Mahadevan. Deep rein-forcement learning with macro-actions. arXiv:1606.04615, Jun. 2016.
[9] Zhiheng Zhao, Yi Liang, and Xiaoming Jin. Handling large-scale action space in deep q network. In 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), pages 93–96. IEEE, 2018.
[10] B. Baker, O. Gupta, N. Naik, and R. Raskar. Designing neural network architectures using reinforcement learning. In Proc. Int. Conf. Learning Rep-resentations (ICLR), Apr. 2017.
[11] B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In Proc. Int. Conf. Learning Representations (ICLR), Apr. 2017.
[12] Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable ar-chitecture search. ArXiv, abs/1806.09055, 2018.
[13] M. G. Bellemare, Y. Naddaf, J. Veness, and M. H. Bowling. The arcade learning environment: An evaluation platform for general agents. J. Artificial Intelligence Research (JAIR), 2013.
[14] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, USA, 1st edition, 1998.
[15] R. S. Sutton, D. Precup, and S. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, Aug. 1999.
[16] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, et al. Human-level control through deep reinforcement learning. Nature, 518:529–533, Feb. 2015.
[17] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms, 2017.
[18] A. Hill, A. Raffin, M. Ernestus, A. Gleave, R. Traore, et al. Stable baselines. https://github.com/hill-a/stable-baselines, 2018.
[19] Richard E Korf. Macro-operators: A weak method for learning. Artificial intelligence, 26(1):35–77, 1985.
[20] A. Botea, M. Enzenberger, M. M¨uller, and J. Schaeffer. Macro-FF: Improv-ing AI planning with automatically learned macro-operators. J. Artificial Intelligence Research (JAIR), 24:581–621, Oct. 2005.
[21] Masataro Asai and Alex Fukunaga. Solving large-scale planning problems by decomposition and macro generation. In ICAPS, pages 16–24, 2015.
[22] Luk´aˇs Chrpa and Mauro Vallati. Improving domain-independent planning via critical section macro-operators. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7546–7553, 2019.
[23] Adi Botea, Martin M¨uller, and Jonathan Schaeffer. Learning partial-order macros from solutions. In ICAPS, pages 231–240, 2005.
[24] Adi Botea, Martin M¨uller, Jonathan Schaeffer, et al. Fast planning with iterative macros. In IJCAI, pages 1828–1833, 2007.
[25] J. Randlov. Learning macro-actions in reinforcement learning. In Proc. Ad-vances in Neural Information Processing Systems (NeurIPS), pages 1045–1051, Dec. 1999.
[26] A. Vezhnevets, V. Mnih, S. Osindero, A. Graves, O. Vinyals, J. Agapiou, et al. Strategic attentive writer for learning macro-actions. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pages 3486–3494, Dec. 2016.
[27] A. S. Lakshminarayanan, S. Sharma, and B. Ravindran. Dynamic action repetition for deep reinforcement learning. In Proc. the Thirty-First AAAI Conf. Artificial Intelligence (AAAI-17), Feb. 2017.
[28] S. Sharma, A. S. Lakshminarayanan, and B. Ravindran. Learning to repeat: Fine grained action repetition for deep reinforcement learning. In Proc. Int. Conf. Learning Represenations (ICLR), Apr.–May 2017.
[29] K. Heecheol, M. Yamada, K. Miyoshi, and H. Yamakawa. Macro action rein-forcement learning with sequence disentanglement using variational autoen-coder. arXiv:1903.09366, May 2019.
[30] A. I. Coles and A. J. Smith. Marvin: A heuristic search planner with online macro-action learning. J. Artificial Intelligence Research (JAIR), 28:119–156, Jan. 2007.
[31] M. Newton, J. Levine, and M. Fox. Genetically evolved macro-actions in AI planning problems. In Proc. the UK Planning and Scheduling Special Interest Group (PlanSIG) Workshop, pages 163–172. Citeseer, Jan. 2005.
[32] A. Dulac, D. Pellier, H. Fiorino, and D. Janiszek. Learning useful macro-actions for planning with n-grams. In Proc. Int. Conf. Tools with Artificial Intelligence (ICTAI), pages 803–810, Nov. 2013.
[33] M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos. Unifying count-based exploration and intrinsic motivation. In Advances in neural information processing systems, pages 1471–1479, 2016.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *