帳號:guest(3.14.252.21)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):徐敬棠
作者(外文):Hsu, Ching-Tang
論文名稱(中文):運用機器學習建立模型以有效的預測蛋白質的振盪尺度
論文名稱(外文):Use machine learning to establish a predictive model to efficiently estimate the sizes of residue fluctuations in proteins.
指導教授(中文):楊立威
指導教授(外文):Yang, Lee-Wei
口試委員(中文):林澤
楊進木
口試委員(外文):Lin, Che
Yang, Jinn-Moon
學位類別:碩士
校院名稱:國立清華大學
系所名稱:生物資訊與結構生物研究所
學號:104080593
出版年(民國):107
畢業學年度:106
語文別:中文
論文頁數:66
中文關鍵詞:蛋白質振盪彈力網路模型機器學習深度學習分子動力模擬構型的集合蛋白質形狀核磁共振X射線晶體學特徵值隨機森林線性回歸
外文關鍵詞:protein fluctuationelastic network model (ENM)machine learningdeep neural networks (DNN)molecular dynamics simulation (MD)Shannon entropynative ensembleprotein shapenuclear magnetic resonancex-ray crystallographyeigenvaluerandom forestlinear regression
相關次數:
  • 推薦推薦:0
  • 點閱點閱:108
  • 評分評分:*****
  • 下載下載:3
  • 收藏收藏:0
蛋白質受熱擾動的振盪大小及固有的(intrinsic)構型變化,對蛋白質與蛋白質之間的交互作用及活性抑制至關重要。雖然現今分子動力學模擬 (Molecular Dynamics Simulation, MD)已被大家廣泛的運用來探索蛋白質的結構變化,或是觀察蛋白質與蛋白質或小分子 (ligand)的交互作用等,MD可以用於觀察蛋白質的振盪, 但是該方法需要耗費較大的運算資源與時間(尤其是對蛋白質複合體)。蛋白質振盪可細分為振盪方向及振幅,目前已存在物理模型 (ex. Elastic Network Model, ENM)可以有效的預測蛋白質振盪的方向,但尚未有高效的模型可以從單一蛋白質的三維結構中預測振盪絕對幅度。故本篇欲利用深度神經網路的方法 (Deep Neural Network, DNN)建立快速有效的蛋白質振盪尺度預測模型。DNN屬於監督式學習,需要提供足量的特徵及答案。本篇選定2792個已知三維結構的蛋白質家族作為訓練模型的數據,並由蛋白質序列資訊,結構及ENM-定義的蛋白質動態中,提取蛋白質及各殘基的特性作為模型輸入的特徵。 而訓練集的答案部分為每個殘基的絕對振盪幅度 (root-mean square fluctuation, RMSF),因時間及運算資源考量,無法使用MD取得各個蛋白質家族每個殘基的RMSFMD,故本篇用以下三種方法計算的振盪絕對幅度來逼近MD的模擬結果。 1. RMSFNE,將相同蛋白質不同的三維結構所形成的Native Ensemble視為該蛋白質的運動軌跡,並計算RMSF 2. RMSFB,從X-ray的B-factor資訊中,換算得來 3. RMSFG,利用Gaussian Network Model (GNM)預估。在本篇研究後發現, RMSFB的絕對水準與RMSFMD 最接近(0.354 percentage difference)。 RMSFNE 及RMSFG在各個殘基的相對振盪大小與RMSFMD很高的相關係數 (0.632, 0.673),但絕對振幅卻與RMSFMD相差較遠 (0.610, 0.467 percentage difference)。 因此本篇將RMSFNE與RMSFG平移到與RMSFB相同的絕對水準,形成兩個新的預測目標Shifted RMSFNE (RMSFSNE)及Shifted RMSFG (RMSFSG)。該二個新的預測標準的可以成功的逼近RMSFMD,percentage difference分別為0.477, 0.354。 本篇分別以RMSFB、RMSFSNE及RMSFSG作為訓練答案的深度網路模型,其中RMSFSG與RMSFB訓練出的模型最佳,預測與實驗RMSF的相關係數為0.751, 0.638,將預測的RMSFSG及RMSFB比較RMSFMD後,可發現percentage difference分別為0.319, 0.371,顯示用RMSFSG模型來預測蛋白質在水溶液中的振盪有不錯的效果。我們統計了39個特徵在訓練的模型中所佔的權重,數據顯示有關GNM的特徵都佔有很高的權重,其中Shannon entropy佔有最高權重的比率,說明了GNM 頻率分布對預測蛋白質振盪有很大的幫助,而蛋白質形狀的特徵也佔有很高的權重,表示蛋白質形狀也與蛋白質振盪大小有關。我們將結合ENM (運動方向) 以及RMSFSG (運動大小) 模型來預測蛋白質可能的構型,並將其運用在protein-ligand docking上,以檢驗此方法的應用性。
Thermal fluctuations and intrinsic conformational changes of proteins are extremely important for protein-protein interaction, inhibition and function. Although molecular dynamics simulation (MD) has been widely used to explore the structural changes in the protein and to observe protein-protein/ protein-ligand interactions, using molecular dynamics to simulate protein fluctuations is usually costly and energy-inefficient, especially for large protein complexes. Fluctuation has its size and direction. Currently, existing physical model such as Elastic Network Model (ENM) can efficiently estimate the direction of protein fluctuation. However, there has not been an efficient (more efficient than MD) and well-accepted method to predict absolute sizes of fluctuations. In this study, we use deep neural network (DNN) to establish a predictive model to efficiently estimate the size of residue fluctuation in protein. DNN belongs to supervised learning requiring adequate features and labels (answers). We extracted features from 2792 protein structural clusters to train our model. The features include 39 relevant characteristics extracted from protein sequences, structures and ENM-defined vibrational dynamics. The training target is the size of residue fluctuations in water but we cannot directly obtain that from MD for all the 2792 clusters provided limited time and computing resources. Therefore, we used the following three methods to calculate the absolute size of fluctuation to approximate the MD simulation results - 1. RMSFNE, the root-mean-square- fluctuations (RMSF) of the native ensemble formed by different structures of same proteins, considered as the conformational spread of the protein. 2. RMSFB, the RMSF derived from atom B-factors for structures resolved by X-ray crystallography. 3. RMSFG, the RMSF derived from Gaussian Network Model (GNM) when a force constant of 0.6 kcal/mol/Å2 is used. In this study we found the absolute size of RMSFB is closer to RMSFMD (0.354 percentage difference) than the other two. A high correlation (0.632, 0.673) between RMSFNE, RMSFG and RMSFMD is found but the absolute size of fluctuations do not match well (percentage difference: 0.610, 0.467). As a result we shifted RMSFNE and RMSFG to the absolute level of RMSFB to form new training targets as “shifted” RMSFNE (RMSFSNE) and “shifted” RMSFG (RMSFSG). Two new training targets (RMSFSG, RMSFSNE,) can successfully approach RMSFMD (0.354, 0.477 percentage difference). We therefore use RMSFB, RMSFSNE and RMSFSG as the training targets for the deep learning to predict the sizes of these three RMSFs as well as RMSFMD. The trained RMSFSG and RMSFB models can predict the experimental results with a correlation of 0.751 and 0.638 and predict RMSFMD with a percentage difference of 0.319 and 0.371. The results show that the RMSFSG model can be a good predictor, as comparing with other models, to predict the sizes of fluctuations for solvated proteins. We also examined the weights of 39 features used in the training models. The data show that the features belonging to GNM have an important contribution. Among them, Shannon entropy of low-frequency spectra, GNM covariance and features related to protein shape take a high weighting, but not much of the residue charges/number of H-bonding and secondary structure content in proteins. We will further combine ENM (motional direction) and RMSFSG (motion size) model to predict possible protein conformations to facilitate a better protein-ligand docking.
摘要 i
Abstract iii
致謝 v
目錄 vi
圖目錄 vii
表目錄 viii
1、 緒論 1
1.1 目的 1
1.2 ensembles of structures 4
1.3 運用機器學習的方法預測蛋白質振盪 4
2、 方法 5
2.1 蛋白質資料篩選與分類 5
2.2 Molecular Dynamics Simulations 8
2.3 Gaussian network model (GNM) 9
2.3.1 GNM理論 9
2.3.2 Shannon entropy 10
2.4 Root-mean square fluctuation (RMSF) 12
2.4.1 Native Ensemble RMSF (RMSFNE) 12
2.4.2 MD-sampled RMSF of proteins (RMSFMD) 13
2.4.3 Average B-factor RMSF (RMSFB) 13
2.4.4 Shifted Native Ensemble RMSF (RMSFSNE) 14
2.4.5 Shifted GNM profile RMSF (RMSFSG) 14
2.5 Vibrational Entropy 15
2.6 Overall protein RMSF 15
2.7 Radius of gyration (Rg) 15
2.8 用主成分分析來判斷蛋白質形狀 16
2.9 DSSP (Define Secondary Structure of Proteins) 17
2.10 蛋白質中殘基的極性程度 18
2.11 Intrinsically disordered 19
2.12 solvent-accessible surface area (SASA) 19
2.13 計算殘基氫鍵個數 19
2.14 Atomic contact 20
2.15 機器學習 (深度學習) 22
2.16 Pearson correlation coefficient 24
2.17 percentage difference 24
2.18 Mean error 25
2.18.1 Mean Squared error (MSE) 25
2.18.2 Mean Absolute error (MAE) 25
2.19 神經網路中各變數的重要性 26
3、 結果 29
3.1 RMSFMD與RMSFNE 、RMSFB和RMSFG 的相關性比較 29
3.2 RMSFNE 與RMSFG平移到 RMSFB 結果分析 33
3.3 比較三者RMSFSNE 、RMSFB和RMSFSG 與RMSFMD的振盪尺度 37
3.5 建立機器學習所要訓練的資料 40
3.6 machine learning結果分析 41
3.6.1 訓練資料 41
3.6.2 訓練與測試模型分析 42
3.7 比較predict RMSF與MD RMSF和實驗的RMSF的結果分析 45
4、 討論 53
4.1 feature的重要性 53
4.2 機器學習訓練出的預測器中,由 RMSFSG 預測器最為突出 56
4.3 比較其他機器學習的方法訓練出的模型 58
4.4 機器學習預測的RMSF運用及討論 59
5、 結論 60
參考文獻 62
附錄 66
Table S1. Percentage difference of RMSFB multiply scaling factor and RMSFMD. 66
[1] J.Hollien andS.Marqusee, “Structural distribution of stability in a thermophilic enzyme,” Proc Natl Acad Sci U S A, vol. 96, no. 24, pp. 13674–13678, 1999.
[2] A.Manuscript, W.Blood, andC.Count, “NIH Public Access,” vol. 49, no. 18, pp. 1841–1850, 2009.
[3] B. G.Pierce, K.Wiehe, H.Hwang, B. H.Kim, T.Vreven, andZ.Weng, “ZDOCK server: Interactive docking prediction of protein-protein complexes and symmetric multimers,” Bioinformatics, vol. 30, no. 12, pp. 1771–1773, 2014.
[4] C.Dominguez, R.Boelens, andA. M. J. J.Bonvin, “HADDOCK: A protein-protein docking approach based on biochemical or biophysical information,” J. Am. Chem. Soc., vol. 125, no. 7, pp. 1731–1737, 2003.
[5] G.Morris andR.Huey, “AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility,” J. …, vol. 30, no. 16, pp. 2785–2791, 2009.
[6] E.Eyal, L. W.Yang, andI.Bahar, “Anisotropic network model: Systematic evaluation and a new web interface,” Bioinformatics, vol. 22, no. 21, pp. 2619–2627, 2006.
[7] N.Popovych, S.Sun, R. H.Ebright, andC. G.Kalodimos, “Dynamically driven protein allostery,” Nat. Struct. Mol. Biol., vol. 13, no. 9, pp. 831–838, 2006.
[8] R.Salomon-Ferrer, A. W.Götz, D.Poole, S.LeGrand, andR. C.Walker, “Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh ewald,” J. Chem. Theory Comput., vol. 9, no. 9, pp. 3878–3888, 2013.
[9] D. E.Shaw et al., “Atomic-level characterization of the structural dynamics of proteins,” Science (80-. )., vol. 330, no. 6002, pp. 341–346, 2010.
[10] A. H.Ratje et al., “Head swivel on the ribosome facilitates translocation by means of intra-subunit tRNA hybrid sites,” Nature, vol. 468, no. 7324, pp. 713–716, 2010.
[11] P. C.Whitford, J. N.Onuchic, andK. Y.Sanbonmatsu, “Connecting energy landscapes with experimental rates for aminoacyl-tRNA accommodation in the ribosome,” J. Am. Chem. Soc., vol. 132, no. 38, pp. 13170–13171, 2010.
[12] L. W.Yang, E.Eyal, I.Bahar, andA.Kitao, “Principal component analysis of native ensembles of biomolecular structures (PCA_NEST): Insights into functional dynamics,” Bioinformatics, vol. 25, no. 5, pp. 606–614, 2009.
[13] P. J.Flory, M.Gordon, andN. G.McCrum, “Statistical Thermodynamics of Random Networks [and Discussion],” Proc. R. Soc. A Math. Phys. Eng. Sci., vol. 351, no. 1666, pp. 351–380, 1976.
[14] M. M.Tirion, “Large amplitude elastic motions in proteins from a single-parameter, atomic analysis,” Phys. Rev. Lett., vol. 77, no. 9, pp. 1905–1908, 1996.
[15] I.Bahar, A. R.Atilgan, M. C.Demirel, andB.Erman, “Vibrational dynamics of folded proteins: Significance of slow and fast motions in relation to function and stability,” Phys. Rev. Lett., vol. 80, no. 12, pp. 2733–2736, 1998.
[16] A. R.Atilgan, S. R.Durell, R. L.Jernigan, M. C.Demirel, O.Keskin, andI.Bahar, “Anisotropy of fluctuation dynamics of proteins with an elastic network model,” Biophys. J., vol. 80, no. 1, pp. 505–515, 2001.
[17] Y.Wang, A. J.Rader, I.Bahar, andR. L.Jernigan, “Global ribosome motions revealed with elastic network model,” J. Struct. Biol., vol. 147, no. 3, pp. 302–314, 2004.
[18] L.-W.Yang andC.-P.Chng, “Coarse-Grained Models Reveal Functional Dynamics - I. Elastic Network Models - Theories, Comparisons and Perspectives.,” Bioinform. Biol. Insights, vol. 2, pp. 25–45, 2008.
[19] H.Li, Y. Y.Chang, L. W.Yang, andI.Bahar, “iGNM 2.0: The Gaussian network model database for biomolecular structural dynamics,” Nucleic Acids Res., vol. 44, no. D1, pp. D415–D422, 2016.
[20] L. W.Yang et al., “oGNM: Online computation of structural dynamics using the Gaussian Network Model,” Nucleic Acids Res., vol. 34, no. WEB. SERV. ISS., pp. 24–31, 2006.
[21] S.Nicolay andY. H.Sanejouand, “Functional modes of proteins are among the most robust,” Phys. Rev. Lett., vol. 96, no. 7, pp. 1–4, 2006.
[22] R. B.Best, K.Lindorff-Larsen, M. A.DePristo, andM.Vendruscolo, “Relation between native ensembles and experimental structures of proteins,” Proc. Natl. Acad. Sci., vol. 103, no. 29, pp. 10901–10906, 2006.
[23] U.Hobohm andC.Sander, “Enlarged representative set of protein structures,” Protein Sci., vol. 3, no. 3, pp. 522–524, 1994.
[24] D.Arthur andS.Vassilvitskii, “K-Means++: the Advantages of Careful Seeding,” Proc. eighteenth Annu. ACM-SIAM Symp. Discret. algorithms, vol. 8, pp. 1027–1025, 2007.
[25] D.D.A. Case, D.S. Cerutti, T.E. Cheatham, III, T.A. Darden, R.E. Duke, T.J. Giese, H. Gohlke, A.W. Goetz, R. L.Greene, N. Homeyer, S. Izadi, A. Kovalenko, T.S. Lee, S. LeGrand, P. Li, C. Lin, J. Liu, T. Luchko, A.D. Mermelstein, K.M. Merz, G. Monard, H. Nguyen, I. Omelyan, A. Onufriev, F. Pan, R. Qi, D.R. Roe, X.Roitberg, C. Sagui, C.L. Simmerling, W.M. Botello-Smith, J. Swails, R.C. Walker, J. Wang, R.M. Wolf, andD. M. Y. and P. A. K.Wu, L. Xiao, “Amber17,” no. April, 2017.
[26] W.Humphrey, A.Dalke, andK.Schulten, “VMD: Visual molecular dynamics,” Journal of Molecular Graphics, vol. 14, no. 1. pp. 33–38, 1996.
[27] A.Kuzmanic andB.Zagrovic, “Determination of ensemble-average pairwise root mean-square deviation from experimental B-factors,” Biophys. J., vol. 98, no. 5, pp. 861–871, 2010.
[28] M.Karplus andJ. N.Kushick, “Method for Estimating the Configurational Entropy of Macromolecules,” Macromolecules, vol. 14, no. 2, pp. 325–332, 1981.
[29] S. E.Dobbins, V. I.Lesk, andM. J. E.Sternberg, “Insights into protein flexibility: The relationship between normal modes and conformational change upon protein-protein docking,” Proc. Natl. Acad. Sci., vol. 105, no. 30, pp. 10390–10395, 2008.
[30] M. Y.Lobanov, N. S.Bogatyreva, andO.V.Galzitskaya, “Radius of gyration as an indicator of protein structure compactness,” Mol. Biol., vol. 42, no. 4, pp. 623–628, 2008.
[31] W.Kabsch andC.Sander, “Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features,” Biopolymers, vol. 22, pp. 2577–2637, 1983.
[32] R.Grantham, “Amino Acid Difference Formula to Help Explain Protein Evolution,” Science (80-. )., vol. 185, no. 4154, pp. 862–864, 1974.
[33] Z. R.Yang, R.Thomson, P.McNeil, andR. M.Esnouf, “RONN: The bio-basis function neural network technique applied to the detection of natively disordered regions in proteins,” Bioinformatics, vol. 21, no. 16, pp. 3369–3376, 2005.
[34] V.Kunik andY.Ofran, “The indistinguishability of epitopes from protein surface is explained by the distinct binding preferences of each of the six antigen-binding loops,” Protein Eng. Des. Sel., vol. 26, no. 10, pp. 599–609, 2013.
[35] I. K.McDonald andJ. M.Thornton, “Satisfying hydrogen bonding potential in proteins,” Journal of Molecular Biology, vol. 238, no. 5. pp. 777–793, 1994.
[36] M.Windows et al., “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” Uma ética para quantos?, vol. XXXIII, no. 2, pp. 81–87, 2014.
[37] N.Srivastava, G.Hinton, A.Krizhevsky, I.Sutskever, andR.Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, 2014.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *