作者(外文):Chang, Chin-Jui
論文名稱(外文):Computational Cost-Aware Control Using Hierarchical Reinforcement Learning
指導教授(外文):Lee, Chun-Yi
口試委員(外文):Chou, Jerry
Hu, Min-Chun
Deep reinforcement learning (DRL) has been demonstrated to provide promising results in a wide range of challenging decision making and control tasks. More challenging tasks typically require DRL policies with higher complexity, which usually comes with the use of larger deep neural network (DNN) models. However, as the model size increases, the required computational costs also grow dramatically, leading to non-negligible energy concerns for battery-limited mobile robots. In order to reduce the overall computational costs required for completing such tasks, in this thesis, we propose a cost-aware strategy based on the observation that a control task can usually be decomposed into segments that require different levels of control complexities. Segments requiring higher control complexities can be handled by a larger DNN, while those requiring lower control complexities can be handled by a smaller DNN. To realize this strategy, we propose a hierarchical RL (HRL) framework consisting of a master policy and two sub-policies of different sizes. The master policy is trained to take the costs of the sub-policies in terms of the number of floating-point operations (FLOPs) into consideration. It periodically selects a sub-policy that is sufficiently capable of handling the current task segment according to its observation of the environment while minimizing the overall cost of the entire task. In this work, we perform extensive experiments to demonstrate that the proposed cost-aware strategy is able to reduce the overall computational costs in a variety of robotic control tasks.
摘要 v
Abstract vii
1 Introduction 1
1.1 Motivation................................. 1
1.2 ProposedMethod ............................. 2
1.3 ThesisOrganization............................ 2
2 Related Work 3
2.1 CostEfficientDeepReinforcementLearning . . . . . . . . . . . . . . 3 2.2 HierarchicalReinforcementLearning .................. 4
3 Background 5
3.1 ReinforcementLearning ......................... 5
3.2 HierarchicalReinforcementLearning .................. 5
3.3 RLAlgorithm............................... 6
3.3.1 DeepQ-Learning(DQN)..................... 6
3.3.2 SoftActor-Critic(SAC) ..................... 7
3.4 AuxiliaryMethodsUsedinExperiments................. 7
3.4.1 HindsightExperienceReplay(HER). . . . . . . . . . . . . . . 7 3.4.2 BoltzmannExplorationforDQN................. 8
4 Proposed Methodology 9
4.1 ProblemFormulation ........................... 9
4.2 Overview of the Cost-Aware Hierarchical Framework . . . . . . . . . . 10
4.3 Cost-AwareTraining ........................... 11
4.4 The Detailed Pseudo-Code of the Proposed Algorithm . . . . . . . . . 11
5 Experimental Setup
5.1 ExperimentalSetup............................ 13 5.1.1 Environments........................... 13 5.1.2 NetworkStructure ........................ 13 5.1.3 Hyperparameters ......................... 15
5.2 SelectionofthePolicyCostcω andtheCoefficientλ . . . . . . . . . . 16
5.3 ComputingInfrastructure......................... 17
6 Experimental Results 19
6.1 Qualitative Analysis of the Proposed Methodology . . . . . . . . . . . 19
6.2 Statistics of the Performance and Cost for the Proposed Methodology . 23 6.2.1 Comparison of the Performance to Baselines . . . . . . . . . . 26
6.2.2 Analysis of the Baselines with More Data Samples . . . . . . . 29 6.3 AblationStudy .............................. 29 6.3.1 EffectivenessoftheCostTerm.................. 29 6.3.2 Sub-Policies w/ and w/o Separated Replay Buffers . . . . . . . 30
7 Conclusion 31
7.1 Conclusion ................................ 31
References 33
