帳號:guest(3.144.17.207)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):宋培源
作者(外文):Sung, Pei-Yuan
論文名稱(中文):A fast gene-gene interaction test considering the correlation in family data (The G-Core Test)
指導教授(中文):鍾仁華
謝文萍
指導教授(外文):Chung, Ren-Hua
Hsieh, Wen-Ping
口試委員(中文):邱燕楓
鄒小蕙
口試委員(外文):Chiu, Yen-Feng
Tsou, Hsiao-Hui
學位類別:碩士
校院名稱:國立清華大學
系所名稱:統計學研究所
學號:101024507
出版年(民國):103
畢業學年度:102
語文別:英文
論文頁數:60
中文關鍵詞:gene-gene interaction
外文關鍵詞:gene-gene interation
相關次數:
  • 推薦推薦:0
  • 點閱點閱:264
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
Genome-wide association studies (GWAS) have been a popular strategy for identifying single nucleotide polymorphisms (SNPs) associated with complex diseases. Gene-gene interactions may play an important role in complex diseases. Therefore, many statistical methods have been developed for gene-gene interaction analysis for GWAS in case-control studies. However, only a few methods are available for family-based interaction analysis and these methods are computationally intensive. Therefore, current family-based interaction methods are not applicable to genome-wide interaction studies that test all possible pairs of SNPs across the genome. We propose an efficient family-based gene-gene interaction test, which compares the difference in log odds ratios for a pair of SNPs between cases and controls. Extended from the Transmission Disequilibrium Test (TDT), cases and controls are defined as the transmitted and non-transmitted haplotypes, respectively, in triads (two parents and one affected child). Extended from the sib TDT (S-TDT), cases and controls are defined as the genotypes in affected and unaffected sibs in discordant sib pairs (DSP), respectively. Multinomial and multivariate hypergeometric distributions are assumed for the statistics for triads and DSPs, respectively, to calculate the variance and covariance. If the sample consists of both triads and DSPs, the statistics for triads and DSPs are combined in the test. We used simulations to demonstrate that the proposed test has correct type I error rates under different scenarios, such as different sample sizes, family structures, and minor allele frequencies. We also performed power studies to evaluate the power for the proposed test with other family-based interaction methods. The results suggested that the proposed test is a valid test and has power comparable to the existing tests. We also used simulations to show that the proposed test runs 20 times faster than the regression-based interaction test. Finally, we applied the test to a family GWAS study for hypertension and several promising SNP-SNP interactions were identified.
Genome-wide association studies (GWAS) have been a popular strategy for identifying single nucleotide polymorphisms (SNPs) associated with complex diseases. Gene-gene interactions may play an important role in complex diseases. Therefore, many statistical methods have been developed for gene-gene interaction analysis for GWAS in case-control studies. However, only a few methods are available for family-based interaction analysis and these methods are computationally intensive. Therefore, current family-based interaction methods are not applicable to genome-wide interaction studies that test all possible pairs of SNPs across the genome. We propose an efficient family-based gene-gene interaction test, which compares the difference in log odds ratios for a pair of SNPs between cases and controls. Extended from the Transmission Disequilibrium Test (TDT), cases and controls are defined as the transmitted and non-transmitted haplotypes, respectively, in triads (two parents and one affected child). Extended from the sib TDT (S-TDT), cases and controls are defined as the genotypes in affected and unaffected sibs in discordant sib pairs (DSP), respectively. Multinomial and multivariate hypergeometric distributions are assumed for the statistics for triads and DSPs, respectively, to calculate the variance and covariance. If the sample consists of both triads and DSPs, the statistics for triads and DSPs are combined in the test. We used simulations to demonstrate that the proposed test has correct type I error rates under different scenarios, such as different sample sizes, family structures, and minor allele frequencies. We also performed power studies to evaluate the power for the proposed test with other family-based interaction methods. The results suggested that the proposed test is a valid test and has power comparable to the existing tests. We also used simulations to show that the proposed test runs 20 times faster than the regression-based interaction test. Finally, we applied the test to a family GWAS study for hypertension and several promising SNP-SNP interactions were identified.
Introduction 1
Methods 4
The PLINK interaction statistic 4
The G-core statistic for Triad 5
The G-core statistic for DSP 7
The G-core statistic for Triad and DSP 9
Simulations 11
Simulation tools 11
Type I error and power simulations 12
Performance comparison 14
Results 15
Type I error simulations 15
Power simulations 16
Performance comparison 19
Real data analysis 19
Discussion 21
Appendix 24
Appendix A 24
Appendix B 31
Reference 39
Figures 42
Tables 47
Supplementary Information 60
Abecasis GR, Cherny SS, Cookson WO, Cardon LR. 2002. Merlin--rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30(1):97-101.
Bagos PG. 2012. On the covariance of two correlated log-odds ratios. Stat Med 31(14):1418-31.
Browning BL, Browning SR. 2009. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84(2):210-23.
Cattaert T, Urrea V, Naj AC, De Lobel L, De Wit V, Fu M, Mahachie John JM, Shen H, Calle ML, Ritchie MD and others. 2010. FAM-MDR: a flexible family-based multifactor dimensionality reduction technique to detect epistasis using related individuals. PLoS One 5(4):e10304.
Chen GB, Zhu J, Lou XY. 2011. A faster pedigree-based generalized multifactor dimensionality reduction method for detecting gene-gene interactions. Stat Interface 4(3):295-304.
Chung RH, Shih CC. 2013. SeqSIMLA: a sequence and phenotype simulation tool for complex disease studies. BMC Bioinformatics 14:199.
Combarros O, Cortina-Borja M, Smith AD, Lehmann DJ. 2009. Epistasis in sporadic Alzheimer's disease. Neurobiol Aging 30(9):1333-49.
Cordell HJ. 2009. Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet 10(6):392-404.
Fang YH, Chiu YF. 2012. SVM-based generalized multifactor dimensionality reduction approaches for detecting gene-gene interactions in family studies. Genet Epidemiol 36(2):88-98.
Hu JK, Wang X, Wang P. 2014. Testing gene-gene interactions in genome wide association studies. Genet Epidemiol 38(2):123-34.
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. 2002. The human genome browser at UCSC. Genome Res 12(6):996-1006.
Kumar R, Nejatizadeh A, Gupta M, Markan A, Tyagi S, Jain SK, Pasha MA. 2012. The epistasis between vascular homeostasis genes is apparent in essential hypertension. Atherosclerosis 220(2):418-24.
Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A and others. 2009. Finding the missing heritability of complex diseases. Nature 461(7265):747-53.
Marchini J, Donnelly P, Cardon LR. 2005. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet 37(4):413-7.
Martin ER, Ritchie MD, Hahn L, Kang S, Moore JH. 2006. A novel method to identify gene-gene effects in nuclear families: the MDR-PDT. Genet Epidemiol 30(2):111-23.
Nikpay M, Seda O, Tremblay J, Petrovich M, Gaudet D, Kotchen TA, Cowley AW, Jr., Hamet P. 2012. Genetic mapping of habitual substance use, obesity-related traits, responses to mental and physical stress, and heart rate and blood pressure measurements reveals shared genes that are overrepresented in the neural synapse. Hypertens Res 35(6):585-91.
O'Connell J, Gurdasani D, Delaneau O, Pirastu N, Ulivi S, Cocca M, Traglia M, Huang J, Huffman JE, Rudan I and others. 2014. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet 10(4):e1004234.
Ranade K, Hsuing AC, Wu KD, Chang MS, Chen YT, Hebert J, Chen YI, Olshen R, Curb D, Dzau V and others. 2000. Lack of evidence for an association between alpha-adducin and blood pressure regulation in Asian populations. Am J Hypertens 13(6 Pt 1):704-9.
Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH. 2001. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69(1):138-47.
Schillaci G, Reboldi G, Verdecchia P. 2001. High-normal serum creatinine concentration is a predictor of cardiovascular risk in essential hypertension. Arch Intern Med 161(6):886-91.
Spielman RS, Ewens WJ. 1998. A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. Am J Hum Genet 62(2):450-8.
Su Z, Marchini J, Donnelly P. 2011. HAPGEN2: simulation of multiple disease SNPs. Bioinformatics 27(16):2304-5.
Ueki M, Cordell HJ. 2012. Improved statistics for genome-wide interaction analysis. PLoS Genet 8(4):e1002625.
Winham SJ, Colby CL, Freimuth RR, Wang X, de Andrade M, Huebner M, Biernacka JM. 2012. SNP interaction detection with Random Forests in high-dimensional genetic data. BMC Bioinformatics 13:164.
Wu KD, Hsiao CF, Ho LT, Sheu WH, Pei D, Chuang LM, Curb D, Chen YD, Tsai HJ, Dzau VJ and others. 2002. Clustering and heritability of insulin resistance in Chinese and Japanese hypertensive families: a Stanford-Asian Pacific Program in Hypertension and Insulin Resistance sibling study. Hypertens Res 25(4):529-36.
Wu X, Jin L, Xiong M. 2008. Composite measure of linkage disequilibrium for testing interaction between unlinked loci. Eur J Hum Genet 16(5):644-51.
Yang C, He Z, Wan X, Yang Q, Xue H, Yu W. 2009. SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics 25(4):504-11.
Zeger SL, Liang KY, Albert PS. 1988. Models for longitudinal data: a generalized estimating equation approach. Biometrics 44(4):1049-60.
Zhao J, Jin L, Xiong M. 2006. Test for interaction between two unlinked loci. Am J Hum Genet 79(5):831-45.
(此全文未開放授權)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *