帳號:guest(18.119.167.33)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):姜柏宏
作者(外文):CHIANG, PO-HUNG
論文名稱(中文):應用改良式簡化群體演算法訓練卷積神經網路權重
論文名稱(外文):Convolution Neural Network Weight Optimization Using Improved Simplified Swarm Optimization
指導教授(中文):葉維彰
指導教授(外文):Yeh, Wei-Chang
口試委員(中文):惠霖
陳以錚
口試委員(外文):Hui, Lin
Chen, Yi-Cheng
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學號:109034544
出版年(民國):111
畢業學年度:110
語文別:中文
論文頁數:35
中文關鍵詞:卷積神經網路改良式簡化群體演算法梯度下降法萬用啟發式演算法
外文關鍵詞:Convolutional Neural NetworkImproved Simplified Swarm OptimizationGradient DescentMetaheuristic algorithm
相關次數:
  • 推薦推薦:0
  • 點閱點閱:74
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
最初的人工神經網路(Artificial Neural Network, ANN)於1959年提出,不過因為電腦性能無法負荷此模型之運算,所以相關研究的推展停滯了將近20年。但電腦運算速度隨著硬體進步而大幅提升之後,陸續有出色的神經網路模型被提出而廣為應用。卷積神經網路(Convolutional Neural Network, CNN)為其中之一,以神經網路作為基礎框架,為一具有多層隱藏層的前饋神經網路。CNN使用了卷積層提取有效的資料特徵,並以池化層來降低圖像的維度,在壓縮資料的同時,汲取出資料中有用的資訊。CNN具有局部神經元連結以及權重共享的優點,在不同的濾波器處理之下,不會學習到相同的特徵,避免過度擬合同時提高了運算效率和準確度。在圖像辨識方面,CNN比其他神經網路擁有更好的預測及辨識能力。
現今兩大訓練神經網路的方法為梯度下降法(Gradient Descent, GD)與隨機方法,然而梯度下降方法有三個主要的缺點:容易陷入局部最佳解、收斂速度緩慢且高度依賴初始解。為了解決以上問題衍生出各種改良型的梯度下降法,包括調整學習率、修改偏微分公式和調整激勵函數,還有使用萬用啟發式演算法(Metaheuristic algorithm)來對模型進行訓練和調整。相對於梯度下降法,萬用啟發式演算法能夠更有效地避開陷入局部最佳解的問題,所以有學者將梯度下降法與粒子群演算法(Particle Swarm Optimization, PSO)合併使用來訓練神經網路,並證實使用粒子群演算法可以有效幫助模型預測以及提升準確度。
本研究欲提出一個透過使用改良式簡化群體演算法(Improved Simplified Swarm Optimization, iSSO)與梯度下降法結合的方法來訓練神經網路,並與梯度下降法訓練進行比較。利用iSSO擅於解決連續型問題的優點,搭配上Adam優化器的穩健性,證實本研究所提出方法能有效解決反向傳播法(Backpropagation, BP)容易陷入局部最佳解的問題,使CNN的準確度在不同測試資料集中都能得到穩定的提升。
The original Artificial Neural Network (ANN) was proposed in 1959, but the research was stalled for nearly 20 years because the computer performance could not load with the computation of this model. However, as the speed of computer computation increased significantly with the advancement of hardware. Excellent neural network models were proposed and widely used. One of them is the Convolutional Neural Network (CNN) which is a feedforward neural network with multiple hidden layers using a neural network as the basic framework. CNN has the advantages of local neuron linkage and weight sharing and doesn’t learn the same features under different filter processing. Which improves computational efficiency and accuracy when avoiding over-fitting. In image recognition. CNN has better prediction and recognition capabilities than other neural networks.
The two major methods for training neural networks today are Gradient Descent (GD) and stochastic methods. However, GD has three major drawbacks: easy to fall into local optimal solutions, slow to converge and highly dependent on the initial solution. To solve these problems, various improved gradient descent methods have been developed. Including adjusting the learning rate, modifying the partial differential formulation and adjusting the activating function. As well as using a metaheuristic algorithm to train and tune the model. Compared with gradient descent, the metaheuristic algorithm is more effective in avoiding the problem of falling into local optimal solutions. So many scholars have combined gradient descent with Particle Swarm Optimization (PSO) to train neural networks and proved that using PSO can effectively help model prediction and improve accuracy.
In this study, we propose a method to train neural networks by combining Improved Simplified Swarm Optimization (iSSO) and gradient descent methods compare it with gradient descent. Using the advantages of iSSO which is good at solving continuous problems, and the robustness of Adam. It is demonstrated that the proposed method can effectively solve the problem of backpropagation (BP) which tends to fall into the local optimal solution. So that the accuracy of CNN can be improved stably in different test datasets.
目錄
摘要 I
Abstract II
目錄 III
圖目錄 V
表目錄 VI
第一章 緒論 1
1.1 研究背景、動機與重要性 1
1.2 論文結構 3
第二章 文獻回顧 4
2.1 深度學習 4
2.2 卷積神經網路 4
2.3 優化器 7
2.4 CNN權重 10
2.5 LeNet-5 10
2.6 改良式簡化群體演算法(iSSO) 11
第三章 研究架構 13
3.1 粒子編碼方式 13
3.2 適應度函數 14
3.3 初始化解 14
3.4 混合改良式簡化群體演算法 15
3.4.1 iSSO-Adam演算方法 16
3.4.2 iSSO更新機制 17
3.4.3 Pseudocode和流程圖 17
3.5 演算法更新流程 18
第四章 實驗結果與分析 20
4.1 資料集 20
4.2 參數設定 22
4.2.1 小樣本測試 22
4.2.2 常態檢定 23
4.2.3 同質性檢定 23
4.2.4 獨立檢定 24
4.3 訓練參數及設定 25
4.4 實驗結果 26
第五章 結論 28
5.1 結論 28
5.2 後續研究方向 28
參考文獻 29

[1] L. D. Harmon, "Artificial neuron," Science, vol. 129, no. 3354, pp. 962-963, 1959.
[2] F. Rosenblatt, "The perceptron: a probabilistic model for information storage and organization in the brain," Psychological review, vol. 65, no. 6, p. 386, 1958.
[3] T. D. Sanger, "Optimal unsupervised learning in a single-layer linear feedforward neural network," Neural networks, vol. 2, no. 6, pp. 459-473, 1989.
[4] W. Zaremba, I. Sutskever, and O. Vinyals, "Recurrent neural network regularization," arXiv preprint arXiv:1409.2329, 2014.
[5] Y. LeCun et al., "Backpropagation applied to handwritten zip code recognition," Neural computation, vol. 1, no. 4, pp. 541-551, 1989.
[6] G. Tesauro, D. S. Touretzky, and T. Leen, Advances in neural information processing systems 7. MIT press, 1995.
[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
[8] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[9] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[10] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[11] H. Faris, I. Aljarah, and S. Mirjalili, "Training feedforward neural networks using multi-verse optimizer for binary classification problems," Applied Intelligence, vol. 45, no. 2, pp. 322-332, 2016.
[12] S. Mirjalili, "How effective is the Grey Wolf optimizer in training multi-layer perceptrons," Applied Intelligence, vol. 43, no. 1, pp. 150-161, 2015.
[13] I. Aljarah, H. Faris, and S. Mirjalili, "Optimizing connection weights in neural networks using the whale optimization algorithm," Soft Computing, vol. 22, no. 1, pp. 1-15, 2018.
[14] H. Robbins and S. Monro, "A stochastic approximation method," The annals of mathematical statistics, pp. 400-407, 1951.
[15] A. Toshev and C. Szegedy, "Deeppose: Human pose estimation via deep neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1653-1660.
[16] R. A. Jacobs, "Increased rates of convergence through learning rate adaptation," Neural networks, vol. 1, no. 4, pp. 295-307, 1988.
[17] A. Van Ooyen and B. Nienhuis, "Improving the convergence of the back-propagation algorithm," Neural networks, vol. 5, no. 3, pp. 465-471, 1992.
[18] D. Zang, J. Ding, J. Cheng, D. Zhang, and K. Tang, "A hybrid learning algorithm for the optimization of convolutional neural network," in International Conference on Intelligent Computing, 2017: Springer, pp. 694-705.
[19] H. M. Albeahdili, T. Han, and N. E. Islam, "Hybrid algorithm for the optimization of training convolutional neural network," Int J Adv Comput Sci Appl, vol. 1, no. 6, pp. 79-85, 2015.
[20] M. Črepinšek, S.-H. Liu, and M. Mernik, "Exploration and exploitation in evolutionary algorithms: A survey," ACM computing surveys (CSUR), vol. 45, no. 3, pp. 1-33, 2013.
[21] G. Xu, "An adaptive parameter tuning of particle swarm optimization algorithm," Applied Mathematics and Computation, vol. 219, no. 9, pp. 4560-4569, 2013.
[22] S. Mirjalili, S. Z. M. Hashim, and H. M. Sardroudi, "Training feedforward neural networks using hybrid particle swarm optimization and gravitational search algorithm," Applied Mathematics and Computation, vol. 218, no. 22, pp. 11125-11137, 2012.
[23] X.-S. Yang, Nature-inspired optimization algorithms. Academic Press, 2020.
[24] D. H. Wolpert and W. G. Macready, "No Free Lunch Theorems for Optimization IEEE Transactions on Evolutionary Computation," E997, 1997.
[25] W.-C. Yeh, "An improved simplified swarm optimization," Knowledge-Based Systems, vol. 82, pp. 60-69, 2015.
[26] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no. 7553, pp. 436-444, 2015.
[27] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, "Learning hierarchical features for scene labeling," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1915-1929, 2012.
[28] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler, "Joint training of a convolutional network and a graphical model for human pose estimation," Advances in neural information processing systems, vol. 27, pp. 1799-1807, 2014.
[29] T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. Černocký, "Strategies for training large scale neural network language models," in 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011: IEEE, pp. 196-201.
[30] G. Hinton et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal processing magazine, vol. 29, no. 6, pp. 82-97, 2012.
[31] T. N. Sainath et al., "Improvements to deep convolutional neural networks for LVCSR," in 2013 IEEE workshop on automatic speech recognition and understanding, 2013: IEEE, pp. 315-320.
[32] J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik, "Deep neural nets as a method for quantitative structure–activity relationships," Journal of chemical information and modeling, vol. 55, no. 2, pp. 263-274, 2015.
[33] T. Ciodaro, D. Deva, J. De Seixas, and D. Damazio, "Online particle detection with neural networks based on topological calorimetry information," in Journal of physics: conference series, 2012, vol. 368, no. 1: IOP Publishing, p. 012030.
[34] D. H. Hubel and T. N. Wiesel, "Receptive fields, binocular interaction and functional architecture in the cat's visual cortex," The Journal of physiology, vol. 160, no. 1, pp. 106-154, 1962.
[35] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in International conference on machine learning, 2015: PMLR, pp. 448-456.
[36] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818-2826.
[37] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, inception-resnet and the impact of residual connections on learning," in Thirty-first AAAI conference on artificial intelligence, 2017.
[38] Y. LeCun et al., "Handwritten digit recognition with a back-propagation network," Advances in neural information processing systems, vol. 2, 1989.
[39] K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, and Y. LeCun, "What is the best multi-stage architecture for object recognition?," in 2009 IEEE 12th international conference on computer vision, 2009: IEEE, pp. 2146-2153.
[40] T. Mikolov, S. Kombrink, L. Burget, J. Černocký, and S. Khudanpur, "Extensions of recurrent neural network language model," in 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2011: IEEE, pp. 5528-5531.
[41] X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks," in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011: JMLR Workshop and Conference Proceedings, pp. 315-323.
[42] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Icml, 2010.
[43] N. Qian, "On the momentum term in gradient descent learning algorithms," Neural networks, vol. 12, no. 1, pp. 145-151, 1999.
[44] J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of machine learning research, vol. 12, no. 7, 2011.
[45] M. D. Zeiler, "Adadelta: an adaptive learning rate method," arXiv preprint arXiv:1212.5701, 2012.
[46] G. Hinton, N. Srivastava, and K. Swersky, "Neural networks for machine learning lecture 6a overview of mini-batch gradient descent," Cited on, vol. 14, no. 8, p. 2, 2012.
[47] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
[48] T. Dozat, "Incorporating nesterov momentum into adam," 2016.
[49] J. Ma and D. Yarats, "Quasi-hyperbolic momentum and Adam for deep learning," arXiv preprint arXiv:1810.06801, 2018.
[50] S. J. Reddi, S. Kale, and S. Kumar, "On the convergence of adam and beyond," arXiv preprint arXiv:1904.09237, 2019.
[51] J. Lucas, S. Sun, R. Zemel, and R. Grosse, "Aggregated momentum: Stability through passive damping," arXiv preprint arXiv:1804.00325, 2018.
[52] D. J. Montana and L. Davis, "Training feedforward neural networks using genetic algorithms," in IJCAI, 1989, vol. 89, pp. 762-767.
[53] R. Mendes, P. Cortez, M. Rocha, and J. Neves, "Particle swarms for feedforward neural network training," in Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No. 02CH37290), 2002, vol. 2: IEEE, pp. 1895-1899.
[54] Y. Chhabra, S. Varshney, and A. Wadhwa, "Hybrid particle swarm training for convolution neural network (CNN)," in 2017 Tenth International Conference on Contemporary Computing (IC3), 2017: IEEE, pp. 1-3.
[55] E. Y. Sari and A. Sunyoto, "Optimization of Weight Backpropagation with Particle Swarm Optimization for Student Dropout Prediction," in 2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), 2019: IEEE, pp. 423-428.
[56] A. R. Syulistyo, D. M. J. Purnomo, M. F. Rachmadi, and A. Wibowo, "Particle swarm optimization (PSO) for training optimization on convolutional neural network (CNN)," Jurnal Ilmu Komputer dan Informasi, vol. 9, no. 1, pp. 52-58, 2016.
[57] M. H. Khalifa, M. Ammar, W. Ouarda, and A. M. Alimi, "Particle swarm optimization for deep learning of convolution neural network," in 2017 Sudan Conference on Computer Science and Information Technology (SCCSIT), 2017: IEEE, pp. 1-5.
[58] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[59] W.-C. Yeh, "A two-stage discrete particle swarm optimization for the problem of multiple multi-level redundancy allocation in series systems," Expert Systems with Applications, vol. 36, no. 5, pp. 9192-9200, 2009.
[60] J. Kennedy and R. Eberhart, "Particle swarm optimization," in Proceedings of ICNN'95-international conference on neural networks, 1995, vol. 4: IEEE, pp. 1942-1948.
[61] W.-C. Yeh, W.-W. Chang, and Y. Y. Chung, "A new hybrid approach for mining breast cancer pattern using discrete particle swarm optimization and statistical method," Expert Systems with Applications, vol. 36, no. 4, pp. 8204-8211, 2009.
[62] Y. K. Ever, "Using simplified swarm optimization on path planning for intelligent mobile robot," Procedia computer science, vol. 120, pp. 83-90, 2017.
[63] C.-L. Huang, "A particle-based simplified swarm optimization algorithm for reliability redundancy allocation problems," Reliability Engineering & System Safety, vol. 142, pp. 221-230, 2015.
[64] W.-C. Yeh, "Novel swarm optimization for mining classification rules on thyroid gland data," Information Sciences, vol. 197, pp. 65-76, 2012.
[65] W.-C. Yeh, "New parameter-free simplified swarm optimization for artificial neural network training and its application in the prediction of time series," IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 4, pp. 661-665, 2013.
[66] W.-C. Yeh, C.-M. Lai, and M.-H. Tsai, "Nurse scheduling problem using Simplified Swarm Optimization," in Journal of Physics: Conference Series, 2019, vol. 1411, no. 1: IOP Publishing, p. 012010.
[67] W.-C. Yeh, Y.-M. Yeh, P.-C. Chang, Y.-C. Ke, and V. Chung, "Forecasting wind power in the Mai Liao Wind Farm based on the multi-layer perceptron artificial neural network model with improved simplified swarm optimization," International Journal of Electrical Power & Energy Systems, vol. 55, pp. 741-748, 2014.
[68] X. Zhang, W.-c. Yeh, Y. Jiang, Y. Huang, Y. Xiao, and L. Li, "A case study of control and improved simplified swarm optimization for economic dispatch of a stand-alone modular microgrid," Energies, vol. 11, no. 4, p. 793, 2018.
[69] C.-M. Lai, W.-C. Yeh, and C.-Y. Chang, "Gene selection using information gain and improved simplified swarm optimization," Neurocomputing, vol. 218, pp. 331-338, 2016.
[70] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010: JMLR Workshop and Conference Proceedings, pp. 249-256.

 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *