帳號:guest(216.73.216.146)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):李相宇
作者(外文):Lee, Hsiang-Yu
論文名稱(中文):蜂巢式網路架構內利用序列到序列模型強化式學習實現多連線換手最佳化
論文名稱(外文):Multi-connectivity Handover Optimization Using Sequence To Sequence Reinforcement Learning in Cellular Network
指導教授(中文):蔡明哲
指導教授(外文):Tsai, Ming-Jer
口試委員(中文):郭桐惟
張仕穎
郭建志
口試委員(外文):Kuo, Tung-Wei
Chang, Shih-Ying
Kuo, Jian-Jhih
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:109065541
出版年(民國):112
畢業學年度:111
語文別:英文
論文頁數:20
中文關鍵詞:換手蜂巢式網路強化式學習序列至序列模型深度學習
外文關鍵詞:HandoverCellular NetworkReinforcement LearningSequence to Sequence ModelDeep Learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:0
  • 評分評分:*****
  • 下載下載:7
  • 收藏收藏:0
在蜂巢式網路架構中,為了應對使用者頻寬需求的提高,達到更高速且
更穩定的連線,於環境中使用雙連線或多連線技術是不可或缺的。在多連
線的架構環境下,每次使用者選擇基地台進行換手時,都能選擇出一個或
多個基地台來滿足使用者的需求。在這種動態的環境下,每一個使用者每
一次的換手都會對網路環境造成影響,因此如何去正確的選擇換手對象去
對應這種變化,以合理的分配頻寬,滿足多數使用者需求,成為了需要解
決的問題。為了應對這個問題,我們提出了使用了序列到序列模型搭配深
度強化式學習來選擇換手對象的方法。
我們提出的方法針對過去在此領域使用強化式學習方法中需要列出所有
可能基地台搭配,因此難以擴大規模使用的缺點進行改良。透過將換手問
題視作一個機器學習中的多標籤分類問題,我們利用序列到序列模型輸出
序列的特性,達成了不影響表現且可彈性擴大規模的目標。透過模擬實驗,
我們的方法得知,得益於較先進的強化式學習方法,相較先前的強化式學
習方法可以得到更好的結果。同時我們也對使用序列到序列模型的效益進
行了實驗驗證,發現使用序列到序列模型能加速訓練流程,更快的得到好
的結果以增加實用性。
In the cellular network architecture, to cope with the increasing bandwidth demands of users and achieve faster and more stable connections, the use of dual or
multi-connectivity architecture is indispensable. In a multi-connectivity architecture, each time a handover is triggered, one or more base stations can be selected
to meet the user’s needs. In such dynamic environment, each handover by every user affects the network environment and is hard to predict. Therefore, the
problem to be solved is how to correctly choose the handover target to adapt to
these changes, allocate bandwidth reasonably, and meet the needs of the majority
of users.To solve this problem, we propose a method that utilizes a sequence-tosequence model combined with deep reinforcement learning to select the handover
target.
Our proposed method improves the drawback of previous reinforcement learning method in this field, which required listing all possible base station combinations, making it difficult to scale up. By considering the handover problem as a
multi-label classification problem in machine learning and utilizing the sequenceto-sequence model’s output sequence characteristics, we achieve the goal of maintaining performance while flexibly scaling up. Through simulation experiments,
our method shows that, thanks to more advanced reinforcement learning methods, it achieves better results compared to previous reinforcement learning approaches.Additionally, we validate the benefits of using a sequence-to-sequence
model,finding that it speeds up the training process and yields good results faster,
thereby increasing practicality.
Acknowledgements
摘要 i
Abstract ii
1 Introduction 1
2 Related Works 3
3 Preliminary 5
3.1 Reinforcement Learning (RL) . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.1 Background of RL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.2 Proximal Policy Optimization (PPO) . . . . . . . . . . . . . . . . . . . 6
3.2 Sequence-to-Sequence Model (Seq2Seq) . . . . . . . . . . . . . . . . . . . . . 7
3.2.1 Background of Seq2Seq . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.2 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 The Proposed Method 9
4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Training Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 Simulation 13
5.1 Simulation setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Performance metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.3 Comparison methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.4 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.4.1 Comparison on other methods . . . . . . . . . . . . . . . . . . . . . . 14
5.4.2 With and without Transformer . . . . . . . . . . . . . . . . . . . . . . 16
6 Conclusion 17
References 19
[1] K. D. C. Silva, Z. Becvar, and C. R. L. Frances, “Adaptive hysteresis margin based on
fuzzy logic for handover in mobile networks with dense small cells,” IEEE Access, vol. 6,
pp. 17178–17179, 2018.
[2] K. C. Silva, Z. Becvar, E. H. S. Cardoso, and C. R. Francˆes, “Self-tuning handover algorithm based on fuzzy logic in mobile networks with dense small cells,” IEEE Wireless
Communications and Networking Conference, pp. 1–6, 2018.
[3] Z.-H. Huang, Y.-L. Hsu, P.-K. Chang, and M.-J. Tsai, “Efficient handover algorithm in 5g
networks using deep learning,” in GLOBECOM 2020 - 2020 IEEE Global Communications Conference, pp. 1–6, 2020.
[4] Y. Chen, X. Lin, T. Khan, and M. Mozaffari, “Efficient drone mobility support using
reinforcement learning,” in 2020 IEEE Wireless Communications and Networking Conference (WCNC), pp. 1–6, 2020.
[5] M. Sana, A. De Domenico, E. C. Strinati, and A. Clemente, “Multi-agent deep reinforcement learning for distributed handover management in dense mmwave networks,”
in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp. 8976–8980, 2020.
[6] S. Kang, S. Choi, G. Lee, and S. Bahk, “A dual-connection based handover scheme for
ultra-dense millimeter-wave cellular networks,” in 2019 IEEE Global Communications
Conference (GLOBECOM), pp. 1–6, 2019.
[7] F. Zhao, H. Tian, G. Nie, and H. Wu, “Received signal strength prediction based multiconnectivity handover scheme for ultra-dense networks,” in 2018 24th Asia-Pacific Conference on Communications (APCC), pp. 233–238, 2018.
[8] V. Yajnanarayana, H. Rydén, and L. Hévizi, “5g handover using reinforcement learning,”
in 2020 IEEE 3rd 5G World Forum (5GWF), pp. 349–354, 2020.
[9] J. J. Hernández-Carlón, J. Pérez-Romero, O. Sallent, I. Vilà, and F. Casadevall, “A deep
q-network-based algorithm for multi-connectivity optimization in heterogeneous cellularnetworks,” Sensors, vol. 22, no. 16, 2022.
[10] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA,
USA: A Bradford Book, 2018.
[11] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017.
[12] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural
networks,” in Advances in Neural Information Processing Systems (Z. Ghahramani,
M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, eds.), vol. 27, Curran Associates,
Inc., 2014.
[13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and
I. Polosukhin, “Attention is all you need,” 2017.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *