|
[1] Abien Fred Agarap. Deep learning using rectified linear units (relu). CoRR, abs/1803.08375, 2018.
[2] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka- plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry,
Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin,
Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCan- dlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are
few-shot learners. CoRR, abs/2005.14165, 2020.
[3] Lenaic Chizat, Edouard Oyallon, and Francis Bach. On lazy training in dif- ferentiable programming, 2020.
[4] Will Cukierski. Dogs vs. cats, 2013. [5] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016. [6] Bobby He, Balaji Lakshminarayanan, and Yee Whye Teh. Bayesian deep ensembles via the neural tangent kernel. Advances in neural information processing systems, 33:1010–1022, 2020. [7] Arthur Jacot, Franck Gabriel, and Cl ́ement Hongler. Neural tangent kernel: Convergence and generalization in neural networks. CoRR, abs/1806.07572, 2018. [8] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. [9] Jaehoon Lee, Samuel S. Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, and Jascha Sohl-Dickstein. Finite versus infinite neural networks: an empirical study, 2020.
[10] Jaehoon Lee, Lechao Xiao, Samuel S Schoenholz, Yasaman Bahri, Ro- man Novak, Jascha Sohl-Dickstein, and Jeffrey Pennington. Wide neural
networks of any depth evolve as linear models under gradient descent sup∗/sup.JournalofStatisticalMechanics : T heoryandExperiment, 2020(12) : 124002, dec2020.
[11] Nelson Morgan and Herv ́e Bourlard. Generalization and parameter estima- tion in feedforward nets: Some experiments. Advances in neural information
processing systems, 2, 1989. [12] Roman Novak, Lechao Xiao, Jiri Hron, Jaehoon Lee, Alexander A. Alemi, Jascha Sohl-Dickstein, and Samuel S. Schoenholz. Neural tangents: Fast and easy infinite neural networks in python. In International Conference on Learning Representations, 2020. [13] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh,
Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bern- stein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual
Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. [14] Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. Opt: Open pre-trained transformer language models, 2022.
|