帳號:guest(18.217.18.163)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):黃怡嘉
作者(外文):Huang, Yi-Jia
論文名稱(中文):強化學習於混合式流線生產排程問題之探討
論文名稱(外文):Reinforcement Learning for Hybrid Flow Shop Scheduling Problem
指導教授(中文):林則孟
指導教授(外文):Lin, James T.
口試委員(中文):丁慶榮
陳勝一
口試委員(外文):Ting, Ching-Jung
Chen, Sheng-I
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學號:107034534
出版年(民國):109
畢業學年度:108
語文別:中文
論文頁數:88
中文關鍵詞:混合式流線生產排程問題強化學習深度學習
外文關鍵詞:Hybrid flow shop scheduling problemReinforcement learningDeep learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:1316
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
本研究主要探討強化學習於混合式流線生產排程問題。考量新工單來到與不同工單緊急程度的不確定之動態事件,以及機台加工時間不同與順序相依設置時間的情況下,本研究以深度強化學習(Deep Q Network, DQN)方法,目標為在一段時間內最小化平均延遲時間(Mean Tardiness),探討工單的排序與機台派工之排程問題。
本研究提出以模擬為基礎之深度強化學習架構,並應用於混合式流線生產系統中。首先提出如何定義混合式流線生產系統之狀態、動作、獎勵值與系統轉移於強化學習。其次,由於訓練樣本在強化學習中扮演重要角色,且訓練樣本的生成與系統狀態轉移有關,本研究強調以離散事件模擬與強化學習之結合,透過學習事件的連結,當模擬環境中的學習事件發生時,才觸發強化學習的時機,取代未知的狀態,以時間推進之方式讓強化學習知道何時發生下一學習事件、何為發生學習事件之當前狀態。
實驗結果驗證了深度強化學習方法可以隨環境中動態情境的不同,做決策調整,在新工單來到與緊單的不確定之動態情境之下,深度強化學習可以學習到與最佳單一派工法則相近,發揮了強化學習於動態決策問題之優勢。在提升訓練效率的部分,深度強化學習能考慮整體情境,在不同的情境之下,找到整體績效最佳的派工法則。
This research discusses the problem of reinforcement learning in the hybrid flow shop scheduling problem. Considering the real-time events of the arrival of new jobs and urgent jobs, as well as the different processing time of the machine and sequence dependent setup time, this study uses the deep reinforcement learning (Deep Q Network, DQN) method, the goal is to minimize the mean tardiness to discuss the ordering of jobs and the scheduling of machine dispatch.
This research proposes a simulation-based deep reinforcement learning framework and applies it to a hybrid flow shop system. First, the research define the state, action, reward and system state transition of hybrid flow shop system to reinforcement learning. Secondly, training data play an important role in reinforcement learning, and the generation of training samples is related to system state transition, this study emphasizes the combination of discrete event simulation and reinforcement learning. Through the connection of learning events, when it occurs, the timing of reinforcement learning is triggered, replacing the unknown state, and letting reinforcement learning know when the next learning event occurs and what is the current state of the learning event in the way of time advancement.
The experimental results verify that the deep reinforcement learning method can make decision adjustments according to the different dynamic situations in the environment. Under the uncertain dynamic situation of new jobs arrival and urgent jobs, deep reinforcement learning can learn the best single dispatching rule. In the part of improving training efficiency, deep reinforcement learning can consider the overall situation, find the best overall dispatching rules.
摘要
Abstract
第1章 緒論----------------1
第2章 文獻回顧------------5
第3章 研究問題與方法論-----23
第4章 模擬模式建構--------45
第5章 實驗設計與結果分析---65
第6章 結論與建議----------84
參考文獻-----------------86

1. 林則孟, 生產計劃與管制. 2012: 華泰出版社.
2. Arulkumaran, K., Deisenroth, M. P., Brundage, M., and Bharath, A. A. (2017). A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866.
3. Bertel, S., and Billaut, J.-C. (2004). A genetic algorithm for an industrial multiprocessor flow shop scheduling problem with recirculation. European Journal of Operational Research, 159(3), 651-662.
4. Brucker, P. (1999). Scheduling algorithms. Journal of the Operational Research Society, 50, 774.
5. Chen, C., Xia, B., Zhou, B.-h., and Xi, L. (2015). A reinforcement learning based approach for a multiple-load carrier scheduling problem. Journal of Intelligent Manufacturing, 26(6), 1233-1245.
6. Choe, R., Kim, J., and Ryu, K. R. (2016). Online preference learning for adaptive dispatching of AGVs in an automated container terminal. Applied Soft Computing, 38, 647-660.
7. Han, W., F. Guo, and X. Su, A Reinforcement Learning Method for a Hybrid Flow-Shop Scheduling Problem. Algorithms, 2019. 12(11): p. 222.
8. Johnson, D.S. and M.R. Garey, Computers and intractability: A guide to the theory of NP-completeness. 1979: WH Freeman.
9. Kendall, D.G., Stochastic processes occurring in the theory of queues and their analysis by the method of the imbedded Markov chain. The Annals of Mathematical Statistics, 1953: p. 338-354.
10. Liao, C.-J. and C.-T. You, An improved formulation for the job-shop scheduling problem. Journal of the Operational Research Society, 1992. 43(11): p. 1047-1054.
11. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., and Ostrovski, G. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.
12. Morton, T. and D.W. Pentico, Heuristic scheduling systems: with applications to production systems and project management. Vol. 3. 1993: John Wiley & Sons.
13. Ouelhadj, D. and S. Petrovic, A survey of dynamic scheduling in manufacturing systems. Journal of Scheduling, 2009. 12(4): p. 417.
14. Pan, J.C.-H. and J.-S. Chen, Mixed binary integer programming formulations for the reentrant job shop scheduling problem. Computers & operations research, 2005. 32(5): p. 1197-1212.
15. Pfeiffer, A., B. Kádár, and L. Monostori, Stability-oriented evaluation of rescheduling strategies, by using simulation. Computers in Industry, 2007. 58(7): p. 630-643.
16. Priore, P., Gómez, A., Pino, R., and Rosillo, R. (2014). Dynamic scheduling of manufacturing systems using machine learning: An updated review. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 28(1), 83-97.
17. Qu, S., Chu, T., Wang, J., Leckie, J., and Jian, W. (2015). A centralized reinforcement learning approach for proactive scheduling in manufacturing. Paper presented at the 2015 IEEE 20th Conference on Emerging Technologies & Factory Automation (ETFA).
18. Shiue, Y.-R., Data-mining-based dynamic dispatching rule selection mechanism for shop floor control systems using a support vector machine approach. International Journal of Production Research, 2009. 47(13): p. 3669-3690.
19. Sutton, R.S. and A.G. Barto, Reinforcement learning: An introduction. 2018: MIT press.
20. Tang, L.-L., Y. Yih, and C.-Y. Liu, A study on decision rules of a scheduling model in an FMS. Computers in Industry, 1993. 22(1): p. 1-13.
21. Vallada, E. and R. Ruiz, A genetic algorithm for the unrelated parallel machine scheduling problem with sequence dependent setup times. European Journal of Operational Research, 2011. 211(3): p. 612-622.
22. Wang, J., Qu, S., Wang, J., Leckie, J. O., and Xu, R. (2017). Real-Time Decision Support with Reinforcement Learning for Dynamic Flowshop Scheduling. Paper presented at the Smart SysTech 2017; European Conference on Smart Objects, Systems and Technologies.
23. Wang, Y.-C. and J.M. Usher, Application of reinforcement learning for agent-based production scheduling. Engineering Applications of Artificial Intelligence, 2005. 18(1): p. 73-82.
24. Wang, Y.-F., Adaptive job shop scheduling strategy based on weighted Q-learning algorithm. Journal of Intelligent Manufacturing, 2018: p. 1-16.
25. Waschneck, B., Reichstaller, A., Belzner, L., Altenmüller, T., Bauernhansl, T., Knapp, A., and Kyek, A. (2018). Deep reinforcement learning for semiconductor production scheduling. Paper presented at the 2018 29th Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC).
26. Watkins, Christopher JCH, and P. Dayan, Q-learning. Machine learning, 1992. 8(3-4): p. 279-292.
27. Yuan, B., L. Wang, and Z. Jiang. Dynamic parallel machine scheduling using the learning agent. in 2013 IEEE International Conference on Industrial Engineering and Engineering Management. 2013. IEEE.
28. Zhang, T., S. Xie, and O. Rose. Real-time job shop scheduling based on simulation and markov decision processes. in Proceedings of the 2017 Winter Simulation Conference. 2017. IEEE Press.
29. Zhang, Z., Hu, K., Li, S., Huang, H., and Zhao, S. (2013). Chip Attach Scheduling in Semiconductor Assembly. Journal of Industrial Engineering, 2013.
30. Zhang, Z., Zheng, L., Li, N., Wang, W., Zhong, S., and Hu, K. (2012). Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning. Computers & operations research, 39(7), 1315-1324.
31. Zhang, Z., L. Zheng, and M. Weng, Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning. The International Journal of Advanced Manufacturing Technology, 2007. 34(9-10): p. 968-980.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *