論文名稱(外文):Mitigate Distribution Shift with Roughness Aware Update
指導教授(外文):Chang, Shih-Chieh
口試委員(外文):Chen, Yun-Nung
Ho, Tsung-Yi
Chang, Shih-Chieh
外文關鍵詞:Distribution shiftGeneralizationOptimization
Distribution shift is a common phenomenon in training and deploying neural networks, which largely affects the model performance. Previous works have considered two cases of distribution shift, either on model weights or input data, and have proposed methods that separately address these issues.
We propose two optimization techniques, Roughness-Aware Update and Gradient Masking, to mitigate the effect of distribution shift by improving the network generalization, via guiding the optimization to converge to solutions located in the flatter region of loss surface.
Our experiments on the corrupted image datasets and the simulated environment with noisy weights show that, when combining our techniques with the existing leading optimization methods, we can further improve the generalization of the model solution and achieve even better performance.
1. Introduction - 1
2. Related Work - 6
2.1Weight Variations and Input Variations - 6
2.2Model Generalization using Data Augmentation - 7
3. Methods - 9
3.1Roughness-Aware Update - 9
3.2Gradient Masking - 12
4. Experiment - 15
4.1Generalization against weight variations - 15
4.2Corrupted Dataset - 15
5. Conclusions and Discissions - 20
5.1Conclusions - 20
5.2Discussions - 21
References - 24
