作者(外文):Chen, Jia-Hui
論文名稱(外文):Exploring Economical Sweet Spot: Utilizing Neural Tangent Kernel to Determine Stopping Criteria in Large-Scale Models
指導教授(外文):Wu, Shan-Hung
口試委員(外文):Liu, Yi-Wen
Shen, Chih-Ya
Chiu, Wei-Chen
外文關鍵詞:Neural Tangent KernelNeural Networks TrainingEarly Stopping
In the recent years, machine learning models are often over-parameterized where models can get zero training error while having good generalization performance. To boost models’ performance, machine learning practitioners generally use an ad-hoc technique called Early Stopping which utilizes a portion of the training set as validation set and halts the training procedure while having non-degenerated empirical risk on validation set. However, for over-parameterized setting, the model falls into kernel regime and has been proved that the generalization performance improves monotonically during training. So the problem here is not to find a optimal point with best generalization performance, but what is the most economical way to train an over-parameterized model. Based on neural tangent kernel theory, we show that 1) the training dynamics of an over-parameterized model exhibits a 2-phase phenomenon. 2) There exists a critical point for training an over-parameterized model, where the marginal gain decreases shapely after the critical point.
Abstract (Chinese) I
Abstract II

Contents III

List of Figures V
List of Tables VI

1 Introduction 1

2 Observation 3
2.1 Economical Sweet Spot . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Obstacles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Kernel Economical Sweet Spot 5
3.1 Neural Tangent Kernel . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Finding KESS in NTK context . . . . . . . . . . . . . . . . . . . . 7
3.3 Time Relationship between NTK and NN . . . . . . . . . . . . . . 8
3.4 Finding KESS in NN context . . . . . . . . . . . . . . . . . . . . . 9
3.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Evaluation 11
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Transferbility Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Related Work 14

6 Conclusion 15

7 Future Works 16

