作者(外文):Lin, Yi-Ping
論文名稱(外文):Convolution Neural Network Hyperparameter Optimization Using Simplified Swarm Optimization
指導教授(外文):Yeh, Wei-Chang
口試委員(外文):Liang, Yun-Chia
Lai, Chyh-Ming
外文關鍵詞:Machine LearningImage RecognitionConvolutional Neural NetworksSimplified Swarm OptimizationHyperparameter Optimization
現今在各產業中,圖像辨識的技術愈趨被重視,而應用在圖像辨識的機器學習方法中,卷積神經網路(Convolutional Neural Network, CNN)已被廣泛運用。現有的CNN架構已獲得不錯的效率,但為特定應用找到性能最佳的網絡架構並非易事。為了提高CNN性能,有些研究選擇改變網路架構,有些則選擇超參數優化,但其中許多都是手動設計的,需要具備相關的專業知識,也需耗費大量的時間。因此本研究提出將簡化群體演算法(Simplified Swarm Optimization, SSO) 應用在LeNet模型的超參數優化上,除了採用三個既有的資料集MNIST、Fashion-MNIST和Cifar10驗證外,也使用一個真實的晶圓資料集作為實際應用的驗證。實驗結果表明,在四個資料集中,本研究提出的方法相較原始LeNet架構與其他元啟發式演算法,皆有著更高的準確度,且當完成訓練,實際使用時只需要非常短暫的時間即能找到較優的超參數配置,另外也分析了特徵圖經過每一層運作後的輸出尺寸,結果出乎意料地大多為長方形而非正方形,代表此方法能夠容易地找到圖片的特徵,亦可發現架構中沒有運作的層。本研究其貢獻為經過實際資料集驗證,能提供使用者在既有的模型下,以更簡易且更快速的方式獲得更準確的結果,且此研究亦能應用至其他資料集或其他CNN架構,抑或是其他深度學習網路上。
Image recognition technology is increasingly emphasized in various industries today. Among the machine learning approaches applied in image recognition, Convolutional Neural Network (CNN) is widely used. Although existing CNN models have been proven to be efficient, it is not easy to find a network architecture with better performance. Some studies choose to optimize the network architecture, while others choose to optimize the hyperparameters. Most of them are designed manually, which requires relevant expertise and takes a lot of time. Therefore, this study proposes the idea of applying Simplified Swarm Optimization (SSO) on the hyperparameter optimization of LeNet models. In addition to using the three existing datasets MNIST, Fashion-MNIST, and Cifar10 for verification, a real wafer dataset is also used for verification of practical applications. The experimental results show that the proposed algorithm has higher accuracy than the original LeNet model and another meta-heuristic algorithm for all datasets, and it only takes a very short time to find a better hyperparameter configuration after training. Moreover, we also analyze the output shape of the feature map after each layer, and surprisingly, the results were mostly rectangular instead of square. It means that this method can easily extract the features of the picture and also find the layers in the structure that are not working. The contribution of this study is to provide users with a simpler and faster way to obtain more accurate results with the existing models after validation on real datasets. This study can also be applied to other datasets, CNN architectures, or other deep learning networks.
摘要 i
Abstract ii
誌謝 iii
Contents iv
List of Figures vi
List of Tables vii
1.1 Background and Motivation 1
1.2 Research Framework 4
2.1 Image recognition 7
2.2 CNN 8
2.2.1 Convolution Layer 9
2.2.2 Pooling or SubSampling Layer 10
2.2.3 Fully Connected Layer 11
2.2.4 Activation function 12
2.3 LeNet 12
2.4 Hyperparameter Optimization Approaches 14
2.5 Simplified Swarm Optimization 15
3.1 Encoding Strategy 17
3.2 Fitness function 19
3.3 Initialization of Solution 20
3.4 Proposed LeNet-SSO 20
3.4.1 Notations of LeNet-SSO 20
3.4.2 Update Mechanism of LeNet-SSO 21
3.4.3 Stopping Criteria of LeNet-SSO 22
3.4.4 Pseudocode and Flowchart 22
4.1 Datasets 24
4.2 Parameter settings 25
4.2.1 Small Sampling Test 25
4.2.2 ANOVA Test 27
4.2.3 Training Parameters and Detailed Setting 29
4.3 Experimental Results 30
4.3.1 The result of Existing Dataset 30
4.3.2 The result of wafer defect detection 33
4.3.3 Comparison between LeNet-SSO and LeNet-PSO 34
5.1 Conclusion 37
5.2 Future Work 38

