帳號:guest(18.118.19.67)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):吳家寶
作者(外文):Ng, Ka Pou
論文名稱(中文):利用兩階段變異數分析模型分析定序資料之拷貝數變異
論文名稱(外文):Two-step ANOVA model for copy number variation detection on targeted sequencing
指導教授(中文):謝文萍
指導教授(外文):Hsieh, Wen Ping
口試委員(中文):江永進
鍾仁華
學位類別:碩士
校院名稱:國立清華大學
系所名稱:統計學研究所
學號:100024402
出版年(民國):103
畢業學年度:102
語文別:英文
論文頁數:38
中文關鍵詞:兩階段變異數分析模型拷貝數變異
外文關鍵詞:Two-step ANOVA modelcopy number variationCNV
相關次數:
  • 推薦推薦:0
  • 點閱點閱:390
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
拷貝數變異(copy number variation)是一種發生在 DNA 序列上長片段的異常改變,被認為與多種人類疾病的成因有關,在遺傳疾病及癌症的研究上皆扮演著重要的角色。近年隨著定序技術的發展,分析拷貝數變異的門檻下降,多種分析工具不斷被提出應用。然而,這些工具通常是使用低讀序深度、長定序片段,以及正常、疾病成對資料作為分析,甚少對於高讀序深度、短定序片段的資料作出討論。本研究提出一個兩階段變異數分析模型(two-step ANOVA model)作為檢測拷貝數變異的工具,此模型適用於不同類型的資料,其概念為透過兩個不同的變異數分析模型,首先估計正常樣本中的基礎效應,然後對疾病樣本作出有效之修正及處理,從而檢測出拷貝數變異。本文利用三個部分來呈現此模型的有效性:在模擬分析上,透過設定不同的拷貝數變異比例,觀察模型檢測拷貝數變異的功效及其變化,同時與另一個分析工具 ExomeCNV 比較結果,對於讀序深度較高的資料,我們的方法說明了傳統的變異數分析已經可以清楚的分辨差異;在實例分析上,以口腔癌病人數據作分析展示;在相關性分析上,以口腔癌相關疾病作實例說明。綜合三個部分,本研究提出的模型有較低的假陽性率(false positive rate,FPR),為較保守的檢測拷貝數變異工具;而相關性研究的結果表明,此模型能找到與口腔癌相關的拷貝數變異。
Copy number variation (CNV) is a form of structural variation which has abnormal alterations of the copy number in the genome. It is considered to be related to diverse diseases so CNV plays an important role for the study of genetic diseases and cancer. Following the advances in the technology of next-generation sequencing (NGS), the difficulty of analyzing CNV is a decrease and some analysis tools have been developed to detect CNVs. However, the majority of current researches are focused on the data which are of low coverage, long interested regions, and paired case-control samples. There is lack of discussions with high coverage, short targeted regions and unpaired case-control samples. Therefore in this study, we propose a two-step ANOVA model to detect CNVs. This model can be applied on different types of data. The main idea of this model is to apply two different ANOVA models to discover CNVs. We first estimate the base effects from control samples, and then the case samples are effectively adjusted with the base effects in order to detecting CNVs. The results have been summarized by three parts. In the simulation study, different CNVs incidence rates are designed to observe the change of results by the proposed model. We also have a comparison with the other tool, ExomeCNV, in the simulated study. In real data analysis, we show the result of what our model has found in the oral cancer data. In association study, we indicate an analysis process for some clinical traits of oral cancer. In conclusion, the two-step ANOVA model has lower false positive rate. Our model also indicates that the conventional ANOVA model has great performance over the high coverage data compared to sophisticate schemes. The association study detected several important CNVs that are very likely to play an important role in the oral cancer etiology.
1. Introduction
1.1. Copy number variations
1.2. Detection of copy number variation from NGS data using read depth-based approaches
1.3. Existing tools for CNV detection with exome sequencing data
1.4. Main issue of the current research
2. Materials and Methods
2.1. Data description
2.2. ExomeCNV
2.3. Preprocessing
2.4. Two-step ANOVA model
2.5. Procedure for CNV detection
3. Results
3.1. Simulation data analysis
3.2. CNV calls on the cancer data
3.3. Association study using oral cancer data
4. Discussion and Conclusion
1. Zhao, M., et al., Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics, 2013. 14(Suppl 11): p. S1.
2. Freeman, J.L., et al., Copy number variation: new insights in genome diversity. Genome research, 2006. 16(8): p. 949-961.
3. Redon, R., et al., Global variation in copy number in the human genome. nature, 2006. 444(7118): p. 444-454.
4. de Ligt, J., et al., Detection of Clinically Relevant Copy Number Variants with Whole-Exome Sequencing. Human Mutation, 2013. 34(10): p. 1439-1448.
5. Snijders, A.M., et al., Assembly of microarrays for genome-wide measurement of DNA copy number. Nature genetics, 2001. 29(3): p. 263-264.
6. Shendure, J. and H. Ji, Next-generation DNA sequencing. Nature biotechnology, 2008. 26(10): p. 1135-1145.
7. Meyerson, M., S. Gabriel, and G. Getz, Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics, 2010. 11(10): p. 685-696.
8. Alkan, C., B.P. Coe, and E.E. Eichler, Genome structural variation discovery and genotyping. Nature Reviews Genetics, 2011. 12(5): p. 363-376.
9. Teo, S.M., et al., Statistical challenges associated with detecting copy number variations with next-generation sequencing. Bioinformatics, 2012. 28(21): p. 2711-2718.
10. Yoon, S., et al., Sensitive and accurate detection of copy number variants using read depth of coverage. Genome research, 2009. 19(9): p. 1586-1592.
11. Magi, A., et al., Read count approach for DNA copy number variants detection. Bioinformatics, 2012. 28(4): p. 470-478.
12. Ng, S.B., et al., Exome sequencing identifies the cause of a mendelian disorder. Nature genetics, 2009. 42(1): p. 30-35.
13. Sathirapongsasuti, J.F., et al., Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics, 2011. 27(19): p. 2648-2654.
14. Plagnol, V., et al., A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics, 2012. 28(21): p. 2747-2754.
15. Rigaill, G.J., et al., A regression model for estimating DNA copy number applied to capture sequencing data. Bioinformatics, 2012. 28(18): p. 2357-2365.
16. Koboldt, D.C., et al., VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Research, 2012. 22(3): p. 568-576.
17. Venkatraman, E. and A.B. Olshen, A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics, 2007. 23(6): p. 657-663.
18. Boeva, V., et al., Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics, 2011. 27(2): p. 268-269.
19. Krumm, N., et al., Copy number variation detection and genotyping from exome sequence data. Genome Research, 2012. 22(8): p. 1525-1532.
20. Fromer, M., et al., Discovery and Statistical Genotyping of Copy-Number Variation from Whole-Exome Sequencing Depth. American Journal of Human Genetics, 2012. 91(4): p. 597-607.
21. Coin, L.J.M., et al., An exome sequencing pipeline for identifying and genotyping common CNVs associated with disease with application to psoriasis. Bioinformatics, 2012. 28(18): p. I370-I374.
22. Geary, R., The frequency distribution of the quotient of two normal variates. Journal of the Royal Statistical Society, 1930. 93(3): p. 442-446.
23. Geary, R., Extension of a theorem by Harald Cramér on the frequency distribution of the quotient of two variables. Journal of the Royal Statistical Society, 1944. 107(1): p. 56-57.
24. Hinkley, D.V., On the ratio of two correlated normal random variables. Biometrika, 1969. 56(3): p. 635-639.
25. Kidd, J.M., et al., Mapping and sequencing of structural variation from eight human genomes. Nature, 2008. 453(7191): p. 56-64.
(此全文限內部瀏覽)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *