帳號:guest(18.117.78.237)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳家惠
作者(外文):Chen, Jia-Hui
論文名稱(中文):優化效率:利用神經正切核決定大規模模型訓練的停止準則
論文名稱(外文):Exploring Economical Sweet Spot: Utilizing Neural Tangent Kernel to Determine Stopping Criteria in Large-Scale Models
指導教授(中文):吳尚鴻
指導教授(外文):Wu, Shan-Hung
口試委員(中文):劉奕汶
沈之涯
邱維辰
口試委員(外文):Liu, Yi-Wen
Shen, Chih-Ya
Chiu, Wei-Chen
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:108062467
出版年(民國):112
畢業學年度:112
語文別:英文
論文頁數:19
中文關鍵詞:神經正切核神經網絡訓練提前停止
外文關鍵詞:Neural Tangent KernelNeural Networks TrainingEarly Stopping
相關次數:
  • 推薦推薦:0
  • 點閱點閱:98
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
在近年來,機器學習模型通常存在過度參數化的情況,這些模型可以在具有
良好泛化性能的同時達到零訓練誤差。為提高模型效能,機器學習實踐者通常
使用一種稱為「提前停止」的臨時技術,該技術利用訓練集的一部分作為驗證
集,在驗證集上具有非退化經驗風險時停止訓練過程。然而,在過度參數化的
情況下,模型會陷入核區域,並已被證明,在訓練過程中,泛化表現會單調提
高。因此,這裡的問題不是找到具有最佳泛化性能的最佳點,而是如何以最經
濟的方式訓練過度參數化的模型。基於神經切線核理論,我們展示了以下:1)
過度參數化模型的訓練動態呈現兩階段現象。2)存在一個關鍵點,用於訓練過
度參數化模型,在該點之後邊際增益急劇減少。
In the recent years, machine learning models are often over-parameterized where models can get zero training error while having good generalization performance. To boost models’ performance, machine learning practitioners generally use an ad-hoc technique called Early Stopping which utilizes a portion of the training set as validation set and halts the training procedure while having non-degenerated empirical risk on validation set. However, for over-parameterized setting, the model falls into kernel regime and has been proved that the generalization performance improves monotonically during training. So the problem here is not to find a optimal point with best generalization performance, but what is the most economical way to train an over-parameterized model. Based on neural tangent kernel theory, we show that 1) the training dynamics of an over-parameterized model exhibits a 2-phase phenomenon. 2) There exists a critical point for training an over-parameterized model, where the marginal gain decreases shapely after the critical point.
Abstract (Chinese) I
Abstract II

Contents III

List of Figures V
List of Tables VI

1 Introduction 1

2 Observation 3
2.1 Economical Sweet Spot . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Obstacles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Kernel Economical Sweet Spot 5
3.1 Neural Tangent Kernel . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Finding KESS in NTK context . . . . . . . . . . . . . . . . . . . . 7
3.3 Time Relationship between NTK and NN . . . . . . . . . . . . . . 8
3.4 Finding KESS in NN context . . . . . . . . . . . . . . . . . . . . . 9
3.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Evaluation 11
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Transferbility Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Related Work 14

6 Conclusion 15

7 Future Works 16

Bibliography 17
[1] Abien Fred Agarap. Deep learning using rectified linear units (relu). CoRR,
abs/1803.08375, 2018.

[2] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka-
plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry,

Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger,
Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu,
Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin,

Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCan-
dlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are

few-shot learners. CoRR, abs/2005.14165, 2020.

[3] Lenaic Chizat, Edouard Oyallon, and Francis Bach. On lazy training in dif-
ferentiable programming, 2020.

[4] Will Cukierski. Dogs vs. cats, 2013.
[5] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation:
Representing model uncertainty in deep learning. In international conference
on machine learning, pages 1050–1059. PMLR, 2016.
[6] Bobby He, Balaji Lakshminarayanan, and Yee Whye Teh. Bayesian deep
ensembles via the neural tangent kernel. Advances in neural information
processing systems, 33:1010–1022, 2020.
[7] Arthur Jacot, Franck Gabriel, and Cl ́ement Hongler. Neural tangent kernel:
Convergence and generalization in neural networks. CoRR, abs/1806.07572,
2018.
[8] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features
from tiny images. 2009.
[9] Jaehoon Lee, Samuel S. Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao
Xiao, Roman Novak, and Jascha Sohl-Dickstein. Finite versus infinite neural
networks: an empirical study, 2020.

[10] Jaehoon Lee, Lechao Xiao, Samuel S Schoenholz, Yasaman Bahri, Ro-
man Novak, Jascha Sohl-Dickstein, and Jeffrey Pennington. Wide neural

networks of any depth evolve as linear models under gradient descent
sup∗/sup.JournalofStatisticalMechanics : T heoryandExperiment, 2020(12) :
124002, dec2020.

[11] Nelson Morgan and Herv ́e Bourlard. Generalization and parameter estima-
tion in feedforward nets: Some experiments. Advances in neural information

processing systems, 2, 1989.
[12] Roman Novak, Lechao Xiao, Jiri Hron, Jaehoon Lee, Alexander A. Alemi,
Jascha Sohl-Dickstein, and Samuel S. Schoenholz. Neural tangents: Fast
and easy infinite neural networks in python. In International Conference on
Learning Representations, 2020.
[13] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh,

Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bern-
stein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual

Recognition Challenge. International Journal of Computer Vision (IJCV),
115(3):211–252, 2015.
[14] Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen,
Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin,
Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig,
Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer.
Opt: Open pre-trained transformer language models, 2022.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *