應用離散事件模擬於強化學習之探討__國立清華大學博碩士論文全文影像系統

帳號：guest(3.145.180.133) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	羅士銓
作者(外文):	Lo, Shih-Chuan
論文名稱(中文):	應用離散事件模擬於強化學習之探討
論文名稱(外文):	Investigation of Discrete Event Simulation and Reinforcement Learning
指導教授(中文):	林則孟
指導教授(外文):	Lin, James T.
口試委員(中文):	陳盈彥黃建中邱俊智
口試委員(外文):	Chen, Ying-Yan Huang, Edward Qiu, Jun-ZHhi
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	工業工程與工程管理學系
學號:	106034532
出版年(民國):	109
畢業學年度:	108
語文別:	中文
論文頁數:	108
中文關鍵詞:	強化學習、深度學習、離散事件模擬
外文關鍵詞:	Reinforcement learning、Deep reinforcement learning、Discrete event simulation、SysML、SimPy
相關次數:	推薦:0 點閱:387 評分: 下載:10 收藏:0

本研究提出如何將強化學習與離散事件模擬結合，並將強化學習結合離散事件模擬模型應用於FMS/AGV系統之派車問題。本研究強調透過學習事件連結離散事件模擬與強化學習。在模擬的過程中，只有當學習事件被觸發時，才觸發強化學習的學習機制，進行MDP狀態的轉移與獎勵值的反饋，產生訓練樣本。
本研究提出如何以SysML語言輔助建構概念模型，並根據概念模型進一步設計模型並實作Python+SimPy模擬軟體。建構離散事件模擬分4大階段:起始階段、分析階段、設計階段、實作階段。起始階段主要的目的是定義問題，包含定義模擬目標、輸入/輸出、範圍等；分析階段利用SysML工具幫助分析系統的物件結構與系統行為；設計階段則利用SysML工具來設計模擬程式中的類別架構與設計各類別的行為；最後於實作階段對應Python與SimPy語法設至計階段中SysML的類別架構設計與類別行為設計並一一實作，完成模擬模型。
本研究提出以離散事件模擬為基礎之深度強化學習架構，並應用於FMS/AGV系統之派車問題。詳述應用強化學習的生產系統「環境」建置，以及「代理人」設計。實驗結果驗證，強化學習在適當獎勵值、狀態與行動設計下，考慮不確定訂單來到率的情境中，DQN方法之績效能夠超越單一最佳派車法則。

In this research, a method of integrated reinforcement learning and discrete-event simulation is proposed and applied in a AGV dispatching problem in FMS/AGV system. This study combined reomforcement learning and discrete event simulation by learning event. Only when the learning event is triggered, the learning process will be executed which means the MDP transition to next state, get a reward from environment and generate a traning sample.
This study also proposes a process of conceptual modeling support by SysML tool. With the conceptual model, one can further design model and implement simulation model by Python and SimPy. The process of building a simulation model can be separated to 4 phase: The inception phase, The analysis phase, The design phase and the implement phase. The goal in inception phase is to define the problem. It covers the define of simulation goal, the input and output and the boundary, etc. The analysis phase use SysML to analyze the object structure and the system behavior. The design phase use SysML to design the structure of python simulation program and design the behavior of each class. Finally, build the simulation model by mapping Python and SimPy with the result of design phase in the implement phase.
This study proposes a simulation-based deep reinforcement learning framework in an AGV dispatching problem in FMS/AGV. We demonstrate a process of building an “Environment” of production system and the design of “Agent”. The experimental result shows that under the uncertain dynamic situation of new jobs arrival rate, DQN can learn a policy that out perfrom the best single dispatching rule with designed reward function.

摘要 i
Abstract ii
致謝 iii
目錄 iv
圖目錄 vi
表目錄 ix
第 1 章緒論 1
1.1 研究背景與動機 1
1.2 研究目的 3
第 2 章文獻回顧 4
2.1 強化學習 4
2.1.1 強化學習 4
2.1.2 深度強化學習 5
2.2系統模擬的概念模型 7
2.3 SysML 10
2.4強化學習應用於生產系統 11
第 3 章 SysML輔助建構離散事件模擬模型 14
3.1 離散事件模擬模型建構流程 14
3.2 起始階段 17
3.2.1 目標 19
3.2.2 輸出 21
3.2.3 輸入 21
3.2.4 範圍 22
3.2.5 細緻度 24
3.2.6 假設與簡化 26
3.2.7 資料需求 27
3.3 分析階段 27
3.3.2 物件架構 28
3.3.3 系統行為 29
3.4 設計階段 33
3.4.1 SimPy介紹 34
3.4.2 定義類別(class) 36
3.4.3 定義屬性 36
3.4.4 分析系統事件 37
3.4.5 分析事件流程 38
3.4.6 分析控制行為 40
3.5 實作階段 - SimPy物件導向模擬 45
3.6 模擬流程整理 49
3.7 FMS模擬建模範例 49
3.7.1 問題描述 50
3.7.2 FMS - 起始階段 54
3.7.3 FMS – 分析階段 60
3.7.4 FMS – 設計階段 63
3.7.5 FMS – 實作階段 68
第 4 章強化學習結合離散事件模擬 72
4.1離散事件模擬在強化學習中之角色 72
4.1.1 Agent - environment - reward 72
4.1.2 馬可夫決策過程與離散事件模擬 73
4.1.3 learning event 75
4.2 生產系統之強化學習模型 77
4.2.1 生產系統的代理人決策問題 78
4.2.2生產系統的環境模型 80
4.3 RL應用於FMS/AGV系統派車問題 80
4.3.1 FMS/AGV系統之RL架構 80
4.3.2 RL元素設計 84
4.3.3 DQN與simulation 89
4.3.4 exploration strategy 91
4.4 實驗情境 92
4.4.1 DQN訓練效果比較單一派工法 94
4.4.2 系統變異對於學習效果之影響 96
4.4.3 獎勵值對於學習效果之影響 100
第 5 章結論與建議 102
5.1 結論 102
5.2 建議 103
參考文獻 104

[1] 郭曜賑，“UML為基礎之物件導向模擬模式發展程序方法論-晶圓廠自動物料搬運系統為例”，清華大學工業工程研究所碩士論文，1999。
[2] 彭士齊，”應用元件為基礎之物件導向模擬模式發展程序－eM-Plant為例”，清華大學工業工程研究所碩士論文，2002。
[3] 徐孟維，” 機器學習於無人搬運車系統之派車應用”，清華大學工業工程研究所碩士論文，2018。
[4] Antonelli, G., Arrichiello, F., Caccavale, F., & Marino, A. (2014). Decentralized time-varying formation control for multi-robot systems. The International Journal of Robotics Research, 33(7), 1029-1043.
[5] Arbez, G., Birta, L. G., Robinson, S., Brooks, R. J., Kotiadis, K., & van der Zee, D. J. (2011). The ABCmod conceptual modeling framework. Conceptual modeling for discrete-event simulation, 133-178.
[6] Azzaro-Pantel, C., Bernal-Haro, L., Baudet, P., Domenech, S., & Pibouleau, L. (1998). A two-stage methodology for short-term batch plant scheduling: discrete-event simulation and genetic algorithm. Computers & chemical engineering, 22(10), 1461-1481.
[7] Bellemare, M. G., Dabney, W., & Munos, R. (2017). A distributional perspective on reinforcement learning. arXiv preprint arXiv:1707.06887.
[8] Balci, O., & Ormsby, W. F. (2007). Conceptual modelling for designing large-scale simulations. Journal of Simulation, 1(3), 175-186.
[9] Balci, O. V. (1994). Verification, and Testing Techniques throughout the Lifecycle of a Simulation Study. Virginia Polytechnic Institute.
[10] Birta Louis, G., & Gilbert, A. (2007). Modelling and Simulation: Exploring Dynamic System Behaviour. Ottawa: School of information technology and engineering.
[11] Booch, G., Rumbaugh, J. and Jacobson, I."The Unified Modeling Language User Guide", Addison-Wesley(1999) .
[12] Castro, R., Kofman, E., & Wainer, G. A. (2008, April). A formal framework for stochastic DEVS modeling and simulation. In SpringSim (pp. 421-428).
[13] Claes, D. (2018). Decentralised Multi-Robot Systems Towards Coordination in Real World Settings (Doctoral dissertation, University of Liverpool).
[14] de Lara, J., Guerra, E., Boronat, A., Heckel, R., & Torrini, P. (2014). Domain-specific discrete event modelling and simulation using graph transformation. Software & Systems Modeling, 13(1), 209-238.
[15] Egbelu, Pius J., and Jose MA Tanchoco. "Characterization of automatic guided vehicle dispatching rules." International Journal of Production Research 22.3 (1984): 359-374.
[16] Fortunato, M., Azar, M. G., Piot, B., Menick, J., Osband, I., Graves, A., ... & Blundell, C. (2017). Noisy networks for exploration. arXiv preprint arXiv:1706.10295.
[17] Friedenthal, S., A. Moore, and R. Steiner. 2008b. A practical guide to SysML: The Systems Modeling Language. Burlington, MA: Morgan Kaufmann Publishers.
[18] Furian, N., O’Sullivan, M., Walker, C., Vössner, S., & Neubacher, D. (2015). A conceptual modeling framework for discrete event simulation using hierarchical control structures. Simulation modelling practice and theory, 56, 82-96.
[19] Gabel, T., & Riedmiller, M. (2008). Adaptive reactive job-shop scheduling with reinforcement learning agents. International Journal of Information Technology and Intelligent Computing, 24(4), 14-18.
[20] Huang, E., Ramamurthy, R., & McGinnis, L. F. (2007, December). System and simulation modeling using SysML. In 2007 Winter Simulation Conference (pp. 796-803). IEEE.
[21] Jackson, M., & Johansson, C. (1997, December). Real time discrete event simulation of a PCB production system for operational support. In Proceedings of the 29th conference on Winter simulation (pp. 832-837).
[22] Kotiadis, K., & Robinson, S. (2008, December). Conceptual modelling: knowledge acquisition and model abstraction. In 2008 Winter Simulation Conference (pp. 951-958). IEEE.
[23] Lacy, L. W., Randolph, W., Harris, B., Youngblood, S., Sheehan, J., Might, R., & Metz, M. (2001, March). Developing a consensus perspective on conceptual models for simulation systems. In Proceedings of the 2001 Spring Simulation Interoperability Workshop.
[24] Lawrence, S. (1984). Resource Constrained Project Scheduling: An Experimental Investigation of Heuristic Scheduling Techniques (Supplement), Technical report, Graduate School of Industrial Administration, Carnegie Mellon University, Pittsburgh, PA.
[25] Leemis, L. M., & Park, S. K. (2006). Discrete-event simulation: A first course. Upper Saddle River, NJ: Pearson Prentice Hall.
[26] Liston, P., Kabak, K. E., Dungan, P., Byrne, J., Young, P., & Heavey, C. (2010). An evaluation of SysML to support simulation modeling. In Conceptual Modeling for Discrete-Event Simulation (p. 286). CRC Press.
[27] Li, M. P., Sankaran, P., Kuhl, M. E., Ganguly, A., Kwasinski, A., & Ptucha, R. (2018, December). Simulation analysis of a deep reinforcement learning approach for task selection by autonomous material handling vehicles. In 2018 Winter Simulation Conference (WSC) (pp. 1073-1083). IEEE.
[28] Mahadevan, S., Marchalleck, N., Das, T. K., & Gosavi, A. (1997, July). Self-improving factory simulation using continuous-time average-reward reinforcement learning. In machine learning-international workshop then conference- (pp. 202-210). morgan kaufmann publishers, inc..
[29] Margo, M. W., Yoshimoto, K., Kovatch, P., & Andrews, P. (2007, June). Impact of reservations on production job scheduling. In Workshop on Job Scheduling Strategies for Parallel Processing (pp. 116-131). Springer, Berlin, Heidelberg.
[30] Matloff, N. (2008). Introduction to discrete-event simulation and the simpy language. Davis, CA. Dept of Computer Science. University of California at Davis. Retrieved on August, 2(2009), 1-33.
[31] McFarlane, R. (2018). A survey of exploration strategies in reinforcement learning. McGill University, http://www. cs. mcgill. ca/cs526/roger. pdf, accessed: April.
[32] McGinnis, L., & Ustun, V. (2009, December). A simple example of SysML-driven simulation. In Proceedings of the 2009 Winter Simulation Conference (WSC) (pp. 1703-1710). IEEE.
[33] Merschformann, M., Lamballais, T., De Koster, M. B. M., & Suhl, L. (2019). Decision rules for robotic mobile fulfillment systems. Operations Research Perspectives, 6, 100128.
[34] Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., ... & Kavukcuoglu, K. (2016, June). Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928-1937).
[35] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
[36] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., & Petersen, S. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
[37] Nance, R. E. (1994). The conical methodology and the evolution of simulation model development. Annals of operations research, 53(1), 1-45.
[38] OMG. 2008. OMG SysMLTM Specification Version1.1. http://www.omgsysml.org/(accessed 4 Dec. 2008).
[39] Ou, X., Chang, Q., Arinez, J., & Zou, J. (2018). Gantry work cell scheduling through reinforcement learning with knowledge-guided reward setting. IEEE Access, 6, 14699-14709.
[40] Park, Sang Chan, Narayan Raman, and Michael J. Shaw. "Adaptive scheduling in dynamic flexible manufacturing systems: a dynamic rule selection approach." IEEE Transactions on Robotics and Automation 13.4 (1997): 486-502.
[41] Pace, D. K. (2000). Ideas about simulation conceptual model development. Johns Hopkins APL technical digest, 21(3), 327-336.
[42] Pace, D. K. (1999, March). Development and documentation of a simulation conceptual model. In Proceedings of the 1999 Fall Simulation Interoperability Workshop (p. 42).
[43] PIDD, M. (2003). Tools for Thinking-Modeling in management science–Wiley: Chichester. UK,.
[44] Pierreval, H. (1992, January). Training a neural network by simulation for dispatching problems. In The Third International Conference on Computer Integrated Manufacturing (pp. 332-333). IEEE Computer Society.
[45] Robinson, S. (2006, December). Conceptual modeling for simulation: issues and research requirements. In Proceedings of the 2006 winter simulation conference (pp. 792-800). IEEE.
[46] Robinson, S., Brooks, R., Kotiadis, K., & Van Der Zee, D. J. (Eds.). (2010). Conceptual modeling for discrete-event simulation. CRC Press.
[47] Robinson, S., Brooks, R., Kotiadis, K., & Zee, D. V. D. (2010). A framework for simulation conceptual modeling. Conceptual modeling for discrete-event simulation, 73.
[48] Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized experience replay. arXiv preprint arXiv:1511.05952.
[49] Scheinman, D. E. (2009). Discrete event simulation and production system design for Rockwell hardness test blocks (Doctoral dissertation, Massachusetts Institute of Technology).
[50] Silver, David; Huang, Aja; Maddison, Chris J.; Guez, Arthur; Sifre, Laurent; Driessche, George van den; Schrittwieser, Julian; Antonoglou, Ioannis; Panneershelvam, Veda. (2016) Mastering the game of Go with deep neural networks and tree search. Nature volume 529, pages 484–489.
[51] SimPy document Available form: https://simpy.readthedocs.io/en/latest/
[52] Strens, M., & Windelinckx, N. (2004). Combining planning with reinforcement learning for multi-robot task allocation. In Adaptive Agents and Multi-Agent Systems II (pp. 260-274). Springer, Berlin, Heidelberg.
[53] Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press. Second edition.
[54] Van Hasselt, H., Guez, A., & Silver, D. (2016, March). Deep reinforcement learning with double q-learning. In Thirtieth AAAI conference on artificial intelligence.
[55] Venkateswaran, J., & Son, Y. J. (2005). Hybrid system dynamic—discrete event simulation-based architecture for hierarchical production planning. International Journal of Production Research, 43(20), 4397-4429.
[56] Vis, Iris FA. "Survey of research in the design and control of automated guided vehicle systems." European Journal of Operational Research 170.3 (2006): 677-709.
[57] Wang, Y., Liu, H., Zheng, W., Xia, Y., Li, Y., Chen, P., ... & Xie, H. (2019). Multi-objective workflow scheduling with deep-Q-network-based multi-agent reinforcement learning. IEEE Access, 7, 39974-39982.
[58] Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., & Freitas, N. (2016, June). Dueling network architectures for deep reinforcement learning. In International conference on machine learning (pp. 1995-2003).
[59] Werner, H., & Ehn, G. (2018). Reinforcement learning for planning of a simulatedproduction line. Master's Theses in Mathematical Sciences.
[60] Younes, H. L., & Simmons, R. G. (2004). A formalism for stochastic decision processes with asynchronous events. Proc. of AAAI Work. on Learn. and Plan. in Markov Processes, 107-110.
[61] Zeigler, B. P. (1976). Theory of modeling and simulation. Jhon Wiley & Sons. Inc., New York, NY.
[62] Zeigler, B. P. (1987). Hierarchical, modular discrete-event modelling in an object-oriented environment. Simulation, 49(5), 219-230.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文