帳號:guest(3.133.127.195)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):魏子軒
作者(外文):Wei,Tzu-Hsuan
論文名稱(中文):藉由粗糙度感知更新緩解分佈偏移
論文名稱(外文):Mitigate Distribution Shift with Roughness Aware Update
指導教授(中文):張世杰
指導教授(外文):Chang, Shih-Chieh
口試委員(中文):陳縕儂
何宗義
張世杰
口試委員(外文):Chen, Yun-Nung
Ho, Tsung-Yi
Chang, Shih-Chieh
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:108065532
出版年(民國):110
畢業學年度:109
語文別:英文
論文頁數:26
中文關鍵詞:分布偏移泛化性模型優化
外文關鍵詞:Distribution shiftGeneralizationOptimization
相關次數:
  • 推薦推薦:0
  • 點閱點閱:513
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
分布偏移是神經網絡訓練與部署的過程中常見的現象,此現象的發生對於神經網路的性能有很大影響。過去的文章中分別討論了兩種分布偏移的情況,首先是神經網路權重的情況,第二種則是輸入資料上的變化,並分別提出了解決這些問題的方法。而在此論文中,我將兩種分布偏移同時進行討論,最終我們提出了兩種神經網路訓練過程的優化技術,粗糙感知更新和梯度掩蔽,通過引導神經網路的訓練過程收斂到位於損失平面較為平坦的區域,改善神經網絡的泛化能力,藉此來減輕分布偏移對神經網路所帶來的影響。最後我們也用實驗結果來證明我們方法的有效性,並且可以將我們的方法與現有的改進技巧相結合,來更進一步提高神經網路的泛化能力,並且在各個數據集中取得更好的準確率。
Distribution shift is a common phenomenon in training and deploying neural networks, which largely affects the model performance. Previous works have considered two cases of distribution shift, either on model weights or input data, and have proposed methods that separately address these issues.
We propose two optimization techniques, Roughness-Aware Update and Gradient Masking, to mitigate the effect of distribution shift by improving the network generalization, via guiding the optimization to converge to solutions located in the flatter region of loss surface.
Our experiments on the corrupted image datasets and the simulated environment with noisy weights show that, when combining our techniques with the existing leading optimization methods, we can further improve the generalization of the model solution and achieve even better performance.
Contents
1. Introduction - 1
2. Related Work - 6
2.1Weight Variations and Input Variations - 6
2.2Model Generalization using Data Augmentation - 7
3. Methods - 9
3.1Roughness-Aware Update - 9
3.2Gradient Masking - 12
4. Experiment - 15
4.1Generalization against weight variations - 15
4.2Corrupted Dataset - 15
5. Conclusions and Discissions - 20
5.1Conclusions - 20
5.2Discussions - 21
References - 24
[1] E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le. Randaugment: Practical automated dataaugmentation with a reduced search space. InProceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition Workshops, pages 702–703, 2020.

[2] R. Geirhos, D. H. J. Janssen, H. H. Sch ̈utt, J. Rauber, M. Bethge, and F. A. Wichmann.Comparing deep neural networks against humans: object recognition when the signal getsweaker.arXiv:1706.06969, 2017.

[3] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572, 2014.

[4] D. Hendrycks, S. Basart, N. Mu, S. Kadavath, F. Wang, E. Dorundo, R. Desai, T. Zhu,S. Parajuli, M. Guo, D. Song, J. Steinhardt, and J. Gilmer. The many faces of robustness: Acritical analysis of out-of-distribution generalization.arXiv:2006.16241, 2020.

[5] D. Hendrycks and T. G. Dietterich. Benchmarking neural network robustness to commoncorruptions and surface variations.arXiv:1807.01697, 2018.

[6] D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan. Augmix:A simple data processing method to improve robustness and uncertainty.arXiv:1912.02781,2019.

[7] H. Hosseini, B. Xiao, and R. Poovendran. Google’s cloud vision api is not robust to noise. In2017 16th IEEE international conference on machine learning and applications (ICMLA),pages 101–105. IEEE, 2017.

[8] P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson. Averaging weightsleads to wider optima and better generalization.arXiv:1803.05407, 2018.

[9] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang. On large-batchtraining for deep learning: Generalization gap and sharp minima.arXiv:1609.04836, 2016.

[10] H. Kim, J.-H. Bae, S. Lim, S.-T. Lee, Y.-T. Seo, D. Kwon, B.-G. Park, and J.-H. Lee. Effi-cient precise weight tuning protocol considering variation of the synaptic devices and targetaccuracy.Neurocomputing, 378:189–196, 2020.

[11] N.Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitationsof deep learning in adversarial settings. In2016 IEEE European symposium on security andprivacy (EuroS&P), pages 372–387. IEEE, 2016.

[12] M. Qin and D. Vucinic. Training recurrent neural networks against noisy computationsduring inference. In2018 52nd Asilomar Conference on Signals, Systems, and Computers,pages 71–75, 2018.

[13] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S.Williams, and V. Srikumar. Isaac: A convolutional neural network accelerator with in-situanalog arithmetic in crossbars. In2016 ACM/IEEE 43rd Annual International Symposiumon Computer Architecture (ISCA), pages 14–26, 2016.

[14] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus.Intriguing properties of neural networks.arXiv:1312.6199, 2013.

[15] L.-H. Tsai, S.-C. Chang, Y.-T. Chen, J.-Y. Pan, W. Wei, and D.-C. Juan. Robust processing-in-memory neural networks via noise-aware normalization.arXiv:2007.03230, 2020.

[16] M. R. Zhang, J. Lucas, G. Hinton, and J. Ba. Lookahead optimizer: k steps forward, 1 stepback.arXiv:1907.08610, 2019.

[17] C. Zhou, P. Kadambi, M. Mattina, and P. N. Whatmough. Noisy machines: Understandingnoisy neural networks and enhancing robustness to analog hardware errors using distillation.arXiv:2001.04974, 2020.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *