帳號:guest(3.15.1.140)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):楊炫恭
作者(外文):Yang, Hsuan-Kung
論文名稱(中文):基於光流產生內在獎勵的探索方式
論文名稱(外文):Exploration via Flow-based Intrinsic Rewards
指導教授(中文):李濬屹
指導教授(外文):Lee, Chun-Yi
口試委員(中文):胡敏君
周志遠
口試委員(外文):Hu, Min-Chun
Chou, Jerry
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:107062518
出版年(民國):109
畢業學年度:108
語文別:英文
論文頁數:31
中文關鍵詞:增強式學習光流估計探索好奇心驅使
外文關鍵詞:Reinforcement LearningOptical FlowExplorationCuriosityIntrinsic Rewards
相關次數:
  • 推薦推薦:0
  • 點閱點閱:338
  • 評分評分:*****
  • 下載下載:16
  • 收藏收藏:0
基於預測新穎性產生內部獎勵來促使探索環境的方式已經被廣泛地運用在深度強化學習相關的任務上,此一方式能有效的加速環境探索,並使被訓練的對象在擁有稀疏獎勵的環境也能成功完成任務。然而,在目前的獎勵產生方式當中,並沒有考慮到運動特徵的重要性。 我們認為運動特徵能成為一個很好的新穎性評斷標準,因此我們提出了一種全新內部獎勵模組: Flow-based Intrinsic Curiosity Module (FICM),FICM 利用光流估計當中的預測誤差作為探索之內部獎勵,此一方法能夠利用連續觀測之間捕獲的運動特徵來評估目前環境觀測的新穎性。FICM 鼓勵被訓練的對象探索不熟悉的運動以及場景,進而達到環境探索的功效。我們在多種基準環境下將 FICM 與現有方法進行比較,包括 Atari、SuperMario Bros. 和 ViZDoom 環境,我們證明 FICM 在由運動特徵所組成的環境中有優越的表現,在此同時我們也同時分析 FICM 的運算效率,並全面討論其適用的領域。
In this paper, we focus on a prediction-based novelty estimation strategy upon the deep reinforcement learning (DRL) framework, and present a flow-based intrinsic curiosity module (FICM) to exploit the prediction errors from optical flow estimation as exploration bonuses. We propose the concept of leveraging motion features captured between consecutive observations to evaluate the novelty of observations in an environment. FICM encourages a DRL agent to explore observations with unfamiliar motion features, and requires only two consecutive frames to obtain sufficient information when estimating the novelty. We evaluate our method and compare it with a number of existing methods on multiple benchmark environments, including Atari games, Super Mario Bros., and ViZDoom. We demonstrate that FICM is favorable to tasks or environments featuring moving objects, which allow FICM to utilize the motion features between consecutive observations. We further ablatively analyze the encoding efficiency of FICM, and discuss its applicable domains comprehensively.
1 Introduction 1
2 Background 5
2.1 Curiosity-Driven Exploration Methodologies 5
2.1.1 Next-frame Prediction Strategy 6
2.1.2 Self-frame Prediction Strategy 6
2.2 Optical Flow Estimation 7
3 Methodology 8
3.1 Flow-Based Curiosity-Driven Exploration 9
3.2 Flow-Based Intrinsic Curiosity Module 10
3.3 Implementations of FICM 12
4 Experiments 14
4.1 Experiments on FICM’s Capability of Exploiting Motion Features for Exploration 15
4.2 Experiments on Exploration with Sparse Extrinsic Rewards 17
4.3 Experiments on Hard Exploration Games 19
4.4 Ablation Analysis 20
4.4.1 Stacked versus Non-stacked Frames 20
4.4.2 RGB versus Gray-scale Frames 21
4.5 Discussions on the Impacts of FICM 23
5 Conclusion 27
References 28
[1] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, vol. 518, no. 7540, pp. 529-533, Feb. 2015.
[2] M. Zhang, Z. McCarthy, C. Finn, S. Levine, and P. Abbeel. Learning deep neural network policies with continuous memory states. In Proc. Int. Conf. Robotics and
Automation (ICRA), pages 520–527, May 2016.
[3] R. Houthooft, X. Chen, Y. Duan, J. Schulman, F. D. Turck, and P. Abbeel. VIME:
Variational information maximizing exploration. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pages 1109–1117, Dec. 2016.
[4] M. G. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos. Unifying count-based exploration and intrinsic motivation. In Advances in Neural Information Processing Systems (NeurIPS), pages 1471–1479, Dec. 2016.
[5] G. Ostrovski, M. G. Bellemare, A. van den Oord, and R. Munos. Count-based exploration with neural density models. In Proc. Int. Conf. Machine Learning (ICML), pages 2721–2730, Jun. 2017.
[6] B. C. Stadie, S. Levine, and P. Abbeel. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv:1507.00814, Nov. 2015.
[7] D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell. Curiosity-driven exploration by self-supervised prediction. In Proc. Int. Conf. Machine Learning (ICML), pages 2778–2787, May 2017.
[8] Y. Burda, H. Edwards, D. Pathak, A. J. Storkey, T. Darrell, and A. A. Efros. Large-scale study of curiosity-driven learning. In Proc. Int. Conf. Learning Representation (ICLR), May 2019.
[9] Y. Burda, H. Edwards, A. Storkey, and O. Klimov. Exploration by random network distillation. In roc. Int. Conf. Learning Representations (ICLR), May 2019.
[10] D. P. Kingma and M. Welling. Auto-encoding variational bayes.
arXiv:1312.6114, ay 2014.
[11] T. Xue, J. Wu, K. L. Bouman, and B. Freeman. Visual dynamics: Probabilistic futureframe synthesis via cross convolutional networks. In Proc. Advances in Neural
Information Processing Systems (NeurIPS), pages 91–99, Dec. 2016.
[12] W. Lotter, G. Kreiman, and D. D. Cox. Deep predictive coding networks for video
prediction and unsupervised learning. arXiv:1605.08104, Mar. 2017.
[13] S. Greydanus, A. Koul, J. Dodge, and A. Fern. Visualizing and understanding atari agents. In Int. Conf. Machine Learning (ICML), pages 1787–1796, Jun. 2018.
[14] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. J. Artificial Intelligence Research (JAIR), 47:253–279, May 2013.
[15] M. Wydmuch, M. Kempkaand, and W. Jaskowski. ViZDoom competitions: Playing Doom from pixels. IEEE Trans. Games, Oct. 2018.
[16] P. Fischer, A. Dosovitskiy, and E. IlgA. et al. FlowNet: Learning optical flow with convolutional networks. In Proc. IEEE Int. Conf. Computer Vision (ICCV), pages 2758–2766, May 2015.
[17] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 1647–1655, Dec. 2017.
[18] S. Meister, J. Hur, and S. Roth. Unflow: Unsupervised learning of optical flow with a bidirectional census loss. In Proc. of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), pages 7251–7259, 2018.
[19] T. Beier and S. Neely. Feature-based image metamorphosis. In Special Interest Group on Computer Graphics (SIGGRAPH), pages 35–42, Jul. 1992.
[20] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv:1707.06347, Jul. 2017.
[21] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In Proc. Int. Conf. Machine Learning (ICML), pages 1928–1937, Jun. 2016.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *