帳號:guest(3.145.176.231)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):蕭興暐
作者(外文):Hsiao, Hsin-Wei
論文名稱(中文):超參數最佳化分離式光流特徵之速度泛化性與遷移學習加速適應之研究
論文名稱(外文):Hyper-Parameter Optimization for Speed Generalization and Transfer Learning using Factorized Optical Flow Features
指導教授(中文):李濬屹
指導教授(外文):Lee, Chun-Yi
口試委員(中文):陳聿廣
陳勇志
口試委員(外文):Chen, Yu-Guang
Chen, Yung-Chih
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:107062704
出版年(民國):111
畢業學年度:111
語文別:中文
論文頁數:44
中文關鍵詞:深度強化學習從模擬到現實分離式光流超參數最佳化
外文關鍵詞:Deep Reinforcement LearningSim-to-RealFactorized Optical FlowsHyper-Parameter Optimization
相關次數:
  • 推薦推薦:0
  • 點閱點閱:161
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
近年來,深度強化學習 (Deep Reinforcement Learning) 的蓬勃發展為自動駕駛 (Autonomous Driving) 開拓了全新的展望,然而目前的深度強化學習框架難以負荷在現實世界中進行訓練的成本與風險。因此從模擬到現實 (Sim-to-Real) 成為了一個安全且更有效率的解決方向。其中透過具有領域不變性 (Domain-Invariant) 的中介特徵 (Mid-Level Representations),將模擬訓練環境與真實世界連結是最適宜的方法。在本文中,我們選用新穎的中介特徵—分離式光流作為從模擬到現實的橋樑。有別於原始光流 (Raw Optical Flows) 與語意分割圖 (Semantic Segmentation),分離式光流 (Optical flow factorization) 具有更好的物理意義,可以有效的區分相機視角內物體的移動與相機自身的移動。分離式光流對於自動駕駛研究是具有潛力的中介特徵。然而目前分離式光流在深度強化學習的研究結果不甚理想,本論文透過超參數最佳化 (Hyper-Parameter Optimization) 以及遷移學習 (Transfer Learning) 加速,成功提高了智能代理 (Intelligent Agent) 抵達終點的成功率與訓練效率。為了驗證智能代理是否能適應複雜的道路場景,分別在四種不同的道路模擬場景訓練,並在各場景進行七段不同行人速率區間測試。根據實驗結果,超參數最佳化使訓練效率提升 30% - 60%,在不同測試條件下的平均成功率比基準增加 15\%的成功率。遷移學習加速在部分模擬場景的所有測試上有著 60%以上的成功率,同時縮短 27% - 40%的訓練時間。
In recent years, the thriving development of deep reinforcement learning has provided autonomous cars promising opportunities. However, the cost and risk are too high to afford training a deep reinforcement learning agent in the real world. Therefore, sim-to-real transfer learning has become a safer and more effective alternative. However, connecting simulated training environments and the real world has been a challenge, and mid-level representations offer a promising way to achieve this objective. In this thesis, we leverage a new mid-level representation called “factorized optical flows” to bridge the perception and control modules, allowing the trained agents to be able to transfer to the real world. Different from raw optical flow, factorized optical flows bear better physical meanings, enabling the motions of moving objects to be distinguished from the motion of the camera viewpoint. As a result, factorized optical flows can be regarded as promising mid-level representation for autonomous driving. Unfortunately, the current state-of-the-art research fail to utilize them and optimize the deep reinforcement learning agents properly. In order to address this issue, in this thesis, we adopt hyper-parameter optimization to enhance the training efficiency of our intelligent agents as well as their success rates of reaching the endpoints of various evaluation environments. In addition, we investigate the concept of transfer learning to accelerate the adaptation efficiency to different pedestrian speed ranges in those environments. According to the experimental results, our hyper-parameter optimization improves the training efficiency by more than 50%, and the success rates under different evaluation conditions are improved by 15% as compared to the baseline. With regard to transfer learning, the success rates are improved in certain evaluation scenarios, and the training time is reduced by 27% - 40%.

中文摘要(Chinese Abstract) I
英文摘要(English Abstract) II
誌謝(Acknowledgements) III
目次(Table of Contents) IV
圖目錄(List of Figures) VII
表目錄(List of Tables) VIII
1 緒論(Introduction) 1
1.1 研究背景(Background) . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 文獻回顧(Literature Review) . . . . . . . . . . . . . . . . . . . . . 2
1.3 研究動機(Motivation) . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 論文貢獻(Contribution) . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 論文架構(Thesis Organization) . . . . . . . . . . . . . . . . . . . . 4
2 相關研究(Related Work) 6
2.1 深度強化學習(Deep Reinforcement Learning) . . . . . . . . . . . . 6
2.2 Soft Actor-Critic 演算法. . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 遷移學習(Transfer Learning) . . . . . . . . . . . . . . . . . . . . . 9
3 背景知識(Background Material) 11
3.1 Soft Actor-Critic 演算法. . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 中介特徵(Mid-Level Representations) . . . . . . . . . . . . . . . . 12
3.2.1 語義分割(Semantic Segmentation) . . . . . . . . . . . . . . 13
3.2.2 深度圖(Depth Map) . . . . . . . . . . . . . . . . . . . . . . 13
3.2.3 光流估計(Optical Flow Estimation) . . . . . . . . . . . . . 13
3.3 分離式光流(Optical Flow Factorization) . . . . . . . . . . . . . . . 14
4 方法(Methodology) 17
4.1 超參數調整(Hyper-Parameter Turning) . . . . . . . . . . . . . . . 17
4.1.1 經驗回放緩衝區(Experience Replay Buffer) . . . . . . . . . 18
4.1.2 批量容量(Batch Size) . . . . . . . . . . . . . . . . . . . . . 18
4.2 遷移學習加速(Transfer Learning Speed-Up) . . . . . . . . . . . . . 19
5 實驗結果(Experimental Results) 21
5.1 實驗設置(Experimental Setups) . . . . . . . . . . . . . . . . . . . . 21
5.1.1 模擬環境(Simulation Environment) . . . . . . . . . . . . . 21
5.1.2 強化學習智能代理設置(Reinforcement Learning Agent
Setups) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 超參數最佳化實驗(Experiments of Hyper-Parameter Optimization) 24
5.2.1 實驗目的(Experimental Purpose) . . . . . . . . . . . . . . . 24
5.2.2 實驗步驟(Experimental Steps) . . . . . . . . . . . . . . . . 25
5.2.3 實驗結果(Experimental Results) . . . . . . . . . . . . . . . 27
5.3 遷移學習加速實驗(Experiments of Transfer Learning Speed-Up) . 32
5.3.1 實驗目的(Experimental Purpose) . . . . . . . . . . . . . . . 32
5.3.2 實驗步驟(Experimental Steps) . . . . . . . . . . . . . . . . 32
5.3.3 實驗結果(Experimental Results) . . . . . . . . . . . . . . . 32
6 結論(Conclusions) 36
參考文獻(Bibliography) 37
[1] H.-K. Yang, T.-C. Hsiao, T.-H. Liao, H.-S. Liu, L.-Y. Tsao, T.-W. Wang, S.-Y. Yang, Y.-W. Chen, H.-R. Liao, and C.-Y. Lee, “Investigation of factorized optical flows as mid-level representations,” arXiv preprint arXiv:2203.04927, 2022.
[2] A. Sax, J. O. Zhang, B. Emi, A. Zamir, S. Savarese, L. Guibas, and J. Malik, “Learning to navigate using mid-level visual priors,” arXiv preprint arXiv:1912.11121, 2019.
[3] B. Chen, A. Sax, G. Lewis, I. Armeni, S. Savarese, A. Zamir, J. Malik, and L. Pinto, “Robust policies via mid-level visual representations: An experimental study in manipulation and navigation,” arXiv preprint arXiv:2011.06698, 2020.
[4] L. Yen-Chen, A. Zeng, S. Song, P. Isola, and T.-Y. Lin, “Learning to see before learning to act: Visual pre-training for manipulation,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 7286–7293, IEEE, 2020.
[5] M. M¨uller, A. Dosovitskiy, B. Ghanem, and V. Koltun, “Driving policy transfer via modularity and abstraction,” arXiv preprint arXiv:1804.09364, 2018.
[6] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “A brief survey of deep reinforcement learning,” arXiv preprint arXiv:1708.05866, 2017. 37
[7] J. Garcıa and F. Fern´andez, “A comprehensive survey on safe reinforcement learning,” Journal of Machine Learning Research, vol. 16, no. 1, pp. 1437–1480, 2015.
[8] W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-real transfer in deep reinforcement learning for robotics: a survey,” in 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 737–744, IEEE, 2020.
[9] I. Higgins, A. Pal, A. Rusu, L. Matthey, C. Burgess, A. Pritzel, M. Botvinick, C. Blundell, and A. Lerchner, “Darla: Improving zero-shot transfer in reinforcement learning,” in International Conference on Machine Learning, pp. 1480–1490, PMLR, 2017.
[10] J. Oh, S. Singh, H. Lee, and P. Kohli, “Zero-shot task generalization with multitask deep reinforcement learning,” in International Conference on Machine Learning, pp. 2661–2670, PMLR, 2017.
[11] N. Koenig and A. Howard, “Design and use paradigms for gazebo, an opensource multi-robot simulator,” in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(IEEE Cat. No. 04CH37566), vol. 3, pp. 2149–2154, IEEE, 2004.
12] A. Juliani, V.-P. Berges, E. Teng, A. Cohen, J. Harper, C. Elion, C. Goy, Y. Gao, H. Henry, M. Mattar, et al., “Unity: A general platform for intelligent agents,” arXiv preprint arXiv:1809.02627, 2018.
[13] E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning.” http://pybullet.org, 2016–2021.
[14] E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026–5033, IEEE, 2012. 38
[15] M. Quigley, K. Conley, B. P. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “Ros: an open-source robot operating system,” in ICRA Workshop on Open Source Software, 2009. [16] J. P. Tobin, Real-World Robotic Perception and Control Using Synthetic Data. University of California, Berkeley, 2019.
[17] L. Capito, U. Ozguner, and K. Redmill, “Optical flow based visual potential field for autonomous driving,” in 2020 IEEE Intelligent Vehicles Symposium (IV), pp. 885–891, IEEE, 2020.
[18] B. Zhou, P. Kr¨ahenb¨uhl, and V. Koltun, “Does computer vision matter for action?,” Science Robotics, vol. 4, no. 30, p. eaaw6661, 2019.
[19] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[20] M. v. Otterlo and M. Wiering, “Reinforcement learning and markov decision processes,” in Reinforcement learning, pp. 3–42, Springer, 2012.
[21] A. G. Barto, P. S. Thomas, and R. S. Sutton, “Some recent applications of reinforcement learning,” in Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems, 2017.
[22] E. Ipek, O. Mutlu, J. F. Mart´ınez, and R. Caruana, “Self-optimizing memory controllers: A reinforcement learning approach,” ACM SIGARCH Computer Architecture News, vol. 36, no. 3, pp. 39–50, 2008.
[23] G. Tesauro, D. Gondek, J. Lenchner, J. Fan, and J. M. Prager, “Simulation, learning, and optimization techniques in watson’s game strategies,” IBM Journal of Research and Development, vol. 56, no. 3.4, pp. 16–1, 2012. 39
[24] Z. Zainuddin and O. Pauline, “Function approximation using artificial neural networks,” WSEAS Transactions on Mathematics, vol. 7, no. 6, pp. 333–338, 2008.
[25] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1334–1373, 2016.
[26] I. Osband, C. Blundell, A. Pritzel, and B. Van Roy, “Deep exploration via bootstrapped dqn,” Advances in neural information processing systems, vol. 29, 2016.
[27] H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, Mar. 2016.
[28] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
[29] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015. [30] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning, pp. 1861–1870, PMLR, 2018.
[31] A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike adaptive elements that can solve difficult learning control problems,” IEEE transactions on systems, man, and cybernetics, no. 5, pp. 834–846, 1983. 40
[32] T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, et al., “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905, 2018.
[33] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” arXiv preprint arXiv:1606.01540, 2016.
[34] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in International conference on machine learning, pp. 387–395, PMLR, 2014. [35] M. Vecerik, T. Hester, J. Scholz, F. Wang, O. Pietquin, B. Piot, N. Heess, T. Roth¨orl, T. Lampe, and M. Riedmiller, “Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards,” arXiv preprint arXiv:1707.08817, 2017.
[36] Z. Zhu, K. Lin, and J. Zhou, “Transfer learning in deep reinforcement learning: A survey,” arXiv preprint arXiv:2009.07888, 2020.
[37] F. Fern´andez and M. Veloso, “Probabilistic policy reuse in a reinforcement learning agent,” in Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, pp. 720–727, 2006.
[38] T. Haarnoja, H. Tang, P. Abbeel, and S. Levine, “Reinforcement learning with deep energy-based policies,” in International conference on machine learning, pp. 1352–1361, PMLR, 2017.
[39] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and J. Garcia-Rodriguez, “A review on deep learning techniques applied to semantic segmentation,” arXiv preprint arXiv:1704.06857, 2017. 41
[40] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “Bisenet: Bilateral segmentation network for real-time semantic segmentation,” in Proceedings of the European conference on computer vision (ECCV), pp. 325–341, 2018.
[41] P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell, “Understanding convolution for semantic segmentation,” in 2018 IEEE winter conference on applications of computer vision (WACV), pp. 1451–1460, Ieee, 2018.
[42] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, and A. Agrawal, “Context encoding for semantic segmentation,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7151–7160, 2018.
[43] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” Advances in neural information processing systems, vol. 27, 2014.
[44] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International journal of computer vision, vol. 47, no. 1, pp. 7–42, 2002.
[45] G. Zhou, L. Fang, K. Tang, H. Zhang, K. Wang, and K. Yang, “Guidance: A visual sensing platform for robotic applications,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 9–14, 2015.
[46] H.-Y. Kuo, H.-R. Su, S.-H. Lai, and C.-C. Wu, “3d object detection and pose estimation from depth image for robotic bin picking,” in 2014 IEEE international conference on automation science and engineering (CASE), pp. 1264–1269, IEEE, 2014. 42
[47] R. N. Elek, A. I. K´aroly, T. Haidegger, and P. Galambos, “Towards optical flow ego-motion compensation for moving object segmentation.,” in ROBOVIS, pp. 114–120, 2020.
[48] K. M. Dawson-Howe and D. Vernon, “Simple pinhole camera calibration,” International Journal of Imaging Systems and Technology, vol. 5, no. 1, pp. 1– 6, 1994.
[49] S. Zhang and R. S. Sutton, “A deeper look at experience replay,” arXiv preprint arXiv:1712.01275, 2017.
[50] A. Stooke and P. Abbeel, “Accelerated methods for deep reinforcement learning,” arXiv preprint arXiv:1803.02811, 2018.
[51] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
[52] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. M¨uller, “Efficient backprop,” in Neural networks: Tricks of the trade, pp. 9–48, Springer, 2012.
[53] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang, “On large-batch training for deep learning: Generalization gap and sharp minima,” arXiv preprint arXiv:1609.04836, 2016.
[54] L.-J. Lin, “Self-improving reactive agents based on reinforcement learning, planning and teaching,” Machine learning, vol. 8, no. 3, pp. 293–321, 1992.
[55] X. Qian and D. Klabjan, “The impact of the mini-batch size on the variance of gradients in stochastic gradient descent,” arXiv preprint arXiv:2004.13146, 2020. 43
[56] L. Bottou et al., “Online learning and stochastic approximations,” On-line learning in neural networks, vol. 17, no. 9, p. 142, 1998.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *