作者(外文):Li, Chi-Chang
論文名稱(外文):RPH-PGD: Randomly Projected Hessian for Perturbed Gradient Descent
指導教授(外文):Hon, Wing-Kai
Lee, Che-Rung
口試委員(外文):Wang, Hung-Lung
Tsai, Meng-Tsung
外文關鍵詞:AlgorithmGradient DescentOptimizationHessianSaddle Point
文中,我們提出了一種名為RPH-PGD(Randomly Projected Hessian for Perturbed
Gradient Descent)的方法,以改善PGD的性能。隨機投影漢森矩陣(RPH)是通
本漢森矩陣的特徵向量的豐富信息。 RPH-PGD利用隨機投影漢森矩陣的特徵值和
The perturbed gradient descent (PGD) method, which adds random noises in the
search directions, has been widely used in solving large scaled optimization problems, owing to its capability to escape from saddle points. However, it is inefficient
sometimes for two reasons. First, the random noises may not point to a descent direction, so PGD may still stagnate around saddle points. Second, the size of random
noises, which is controlled by the radius of the perturbation ball, may not be properly configured, so the convergence is slow. In this thesis, we proposed a method,
called RPH-PGD (Randomly Projected Hessian for Perturbed Gradient Descent), to
improve the performance of PGD. The randomly projected Hessian (RPH) is created
by projecting the Hessian matrix into a relatively small subspace which contains rich
information about the eigenvectors of the original Hessian matrix. RPH-PGD utilizes the eigenvalues and eigenvectors of the randomly projected Hessian to identify
the negative curvatures and uses the matrix itself to estimate the changes of Hessian
matrices, which is necessary information for dynamically adjusting the radius during
the computation. In addition, RPH-PGD employs the finite difference method to approximate the product of the Hessian and vectors, instead of constructing the Hessian
explicitly. The amortized analysis shows the time complexity of RPH-PGD is only
slightly higher than that of PGD. The experimental results show that RPH-PGD does
not only converge faster than PGD but also converges in cases that PGD cannot.
中文摘要 1
Abstract 2
List of Figures 5
1 Introduction 6
2 Preliminary 9
2.1 Notation 9
2.2 Methods to Escape from Saddle Points 11
2.3 Perturbed Gradient Descent 11
2.4 Randomized Eigen-Decomposition 13
3 Algorithms 16
3.1 Randomly Projected Hessian 16
3.2 RPH-PGD 19
4 Experiments 23
4.1 Experimental Settings 23
4.2 Comparison with PGD 24
4.3 Ablation Study 26
4.4 Experiments for Resnet18 27
5 Conclusion and Future Work 30
6 Appendix 31
6.1 RPH Algorithm 31
6.2 Adaptive Radius Algorithm 32
6.3 RPH-PGD Algorithm 34
6.4 Escape Saddle Point Test 37
References 44
