帳號:guest(3.145.57.201)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):吳宗祐
作者(外文):Wu, Tsung-Yu
論文名稱(中文):利用DPMM偵測亞克隆的分化
論文名稱(外文):Tumor subclones detection with Dirichlet Process Mixture Model
指導教授(中文):謝文萍
指導教授(外文):Hsieh, Wen-Ping
口試委員(中文):黃禮珊
黃冠華
學位類別:碩士
校院名稱:國立清華大學
系所名稱:統計學研究所
學號:102024501
出版年(民國):104
畢業學年度:103
語文別:英文
中文關鍵詞:癌症亞克隆分化
相關次數:
  • 推薦推薦:0
  • 點閱點閱:242
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
使用NGS的資料庫來偵測疾病狀況。這篇研究的目標是偵測腫瘤亞克隆(subclone)的分化,由於亞克隆的出現通常存在一組約略同期演化出來的體細胞突變(somatic mutation),且這些突變在腫瘤中佔的比例也會很接近,因此我們對SNV(single nucleotide variant)這種體細胞突變的比例加以分群,間接推論亞克隆的分群,最終目的是希望藉由對亞克隆分群來幫助我們探索癌細胞的進化過程。然而拷貝數變異copy number aberration(CNA)會影響我們的資料,使我們計算SNV產生錯誤。因此我們不但要考慮SNV突變,也要兼顧CNA突變。我們的模型為兩步驟的模型。第一步是使用Allele-specific copy number analysis of tumors(ASCAT)來偵測CNA突變。第二步是使用Dirichlet Process Mixture Model (DPMM)偵測SNV突變。DPMM是非常流行的分群方法。然而有一個問題是如何選擇適當的參數α。我們將提供一些想法來選擇適當的α,並且給予α一個學習模型。
Cancer is a malignant cell tumor and a major cause of death throughout the world. In recent years, technology about biomarker detection has been greatly improved. We can apply biomarkers to detect the disease status. Moreover, the new tool called next-generation sequencing (NGS) can rapidly and accurately sequence billions of bases. The target of this study is to cluster the somatically mutated single nucleotide variants (SNV) detected by NGS into subclones according to the proportion of SNV. We hope that grouping subclones can help us to probe the cancer evolution. However, the copy number aberrations (CNA) will affect the information of data and make errors when calculating the proportion of SNV. Hence, we not only consider the SNV mutation but also the CNA mutation. Our model is a two-step model. Our first step is to use Allele-Specific Copy number Analysis of Tumors (ASCAT) to detect the CNA mutation, and our second step is to use Dirichlet Process Mixture Model (DPMM) with Beta-Binomial to detect the SNV mutation. The DPMM is a famous cluster method, but there is an issue of how to choose a suitable value for α. We will provide learning models for α that can achieve a more robust result.
目錄
Chapter 1 Introduction 1
Chapter 2 Methods 6
2.1. Classical Dirichlet Process 7
2.2. Copy number detection 15
2.3. Implementation of DPMM for clustering the SNV loci 23
2.3.1 Implementation of Dirichlet Process 23
2.3.2 Maximizing the posterior expected adjusted Rand index 26
2.3.3 Learning model 29
Chapter 3 Results 35
3.1. Evaluation of DPMM and the learning strategy 35
3.2. Real data of Oral Cancer 48
Chapter 4 Conclusion 54
Aldous, D.J. Exchangeability and related topics. Springer; 1985.
Antoniak, C.E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The annals of statistics 1974:1152-1174.
Bhattacharya, A.: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI; 2010. Performance optimizations and improved sampling techniques for Dirichlet Process Mixture models–Application to deep sequencing of a genetically heterogeneous sample.
Blackwell, D. and MacQueen, J.B. Ferguson distributions via Pólya urn schemes. The annals of statistics 1973:353-355.
Choi, M., et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences 2009;106(45):19096-19101.
Dai, M., et al. Human Papillomavirus Type 16 and TP53 Mutation in Oral Cancer Matched Analysis of the IARC Multicenter Study. Cancer research 2004;64(2):468-471.
Ferguson, T.S. A Bayesian analysis of some nonparametric problems. The annals of statistics 1973:209-230.
Flajolet, P., Dumas, P. and Puyhaubert, V. Some exactly solvable models of urn process theory. DMTCS Proceedings 2006(1).
Fojta, M., et al. Effects of oxidation agents and metal ions on binding of p53 to supercoiled DNA. Journal of Biomolecular Structure and Dynamics 2000;17(sup1):177-183.
Fritsch, A. and Ickstadt, K. Improved criteria for clustering based on the posterior similarity matrix. Bayesian analysis 2009;4(2):367-391.
Gelfand, A.E. and Smith, A.F. Sampling-based approaches to calculating marginal densities. Journal of the American statistical association 1990;85(410):398-409.
Geman, S. and Geman, D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. Pattern Analysis and Machine Intelligence, IEEE Transactions on 1984(6):721-741.
Gilad, Y., Pritchard, J.K. and Thornton, K. Characterizing natural variation using next-generation sequencing technologies. Trends in Genetics 2009;25(10):463-471.
Greaves, M. and Maley, C.C. Clonal evolution in cancer. Nature 2012;481(7381):306-313.
Han, S.-W., et al. Targeted sequencing of cancer-related genes in colorectal cancer using next-generation sequencing. PloS one 2013;8(5):e64271.
Hartwell, L.H. and Kastan, M.B. Cell cycle control and cancer. Science 1994;266(5192):1821-1828.
Hassan, N.M.M., et al. p53 Mutation and Multiple Primary Oral Squamous Cell Carcinomas. INTECH Open Access Publisher; 2012.
Kim, S., Tadesse, M.G. and Vannucci, M. Variable selection in clustering via Dirichlet process mixture models. Biometrika 2006;93(4):877-893.
Kuroda, Y., et al. Association between the TP53 codon72 polymorphism and oral cancer risk and prognosis. Oral oncology 2007;43(10):1043-1048.
Labeit, S., et al. A regular pattern of two types of 100-residue motif in the sequence of titin. 1990.
Lawrence, M.S., et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 2013;499(7457):214-218.
Li, H., et al. Molecular profiling of HNSCC cells and tumors reveals a rational approach to preclinical model selection. Mol Cancer Res 2014.
Liu, P., et al. Identification of somatic mutations in non-small cell lung carcinomas using whole-exome sequencing. Carcinogenesis 2012;33(7):1270-1276.
MacEachern, S.N., Clyde, M. and Liu, J.S. Sequential importance sampling for nonparametric Bayes models: The next generation. Canadian Journal of Statistics 1999;27(2):251-267.
Mardis, E.R. The impact of next-generation sequencing technology on genetics. Trends in genetics 2008;24(3):133-141.
Medvedev, P., Stanciu, M. and Brudno, M. Computational methods for discovering structural variation with next-generation sequencing. Nature methods 2009;6:S13-S20.
Moran, P.A.P. Random processes in genetics. In, Mathematical Proceedings of the Cambridge Philosophical Society. Cambridge Univ Press; 1958. p. 60-71.
Newburger, D.E., et al. Genome evolution during progression to breast cancer. Genome research 2013;23(7):1097-1108.
Nik-Zainal, S., et al. The life history of 21 breast cancers. Cell 2012;149(5):994-1007.
Page, S.E. An Essay on the Existence and Causes of Path Dependence. University of Michigan, USA 2005:37.
Pate, J.R., Tairas, R. and Kraft, N.A. Clone evolution: a systematic review. J Softw-Evol Proc 2013;25(3):261-283.
Peto, R., et al. Smoking, smoking cessation, and lung cancer in the UK since 1950: combination of national statistics with two case-control studies. Brit Med J 2000;321(7257):323-329.
Pizza, M., et al. Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing. Science 2000;287(5459):1816-1820.
Rand, W.M. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association 1971;66(336):846-850.
Roth, A., et al. PyClone: statistical inference of clonal population structure in cancer. Nature methods 2014;11(4):396-398.
Rowley, H., et al. p53 expression and p53 gene mutation in oral cancer and dysplasia. Otolaryngology-Head and Neck Surgery 1998;118(1):115-123.
Sanders, S.J., et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 2012;485(7397):237-241.
Santos, J.M. and Embrechts, M. On the use of the adjusted rand index as a metric for evaluating supervised classification. In, Artificial neural networks–ICANN 2009. Springer; 2009. p. 175-184.
Sethuraman, J. A constructive definition of Dirichlet priors. In.: DTIC Document; 1991.
Shah, S.P., et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 2012;486(7403):395-399.
Smith, D.R., et al. Rapid whole-genome mutational profiling using next-generation sequencing technologies. Genome research 2008;18(10):1638-1642.
Van Loo, P., et al. Allele-specific copy number analysis of tumors. Proceedings of the National Academy of Sciences 2010;107(39):16910-16915.
Van Petegem, F. Ryanodine receptors: structure and function. Journal of Biological Chemistry 2012;287(38):31624-31632.
Wu, H., Thomas, J. and MOMAND, J. p53 protein oxidation in cultured cells in response to pyrrolidine dithiocarbamate: a novel method for relating the amount of p53 oxidation in vivo to the regulation of p53-responsive genes. Biochem. J 2000;351:87-93.
Yang, J. and Wang, W. CLUSEQ: Efficient and effective sequence clustering. Proc Int Conf Data 2003:101-112.
Yeung, K.Y. and Ruzzo, W.L. Details of the adjusted Rand index and clustering algorithms, supplement to the paper “An empirical study on principal component analysis for clustering gene expression data”. Bioinformatics 2001;17(9):763-774.
Zhao, J., et al. Efficient methods for identifying mutated driver pathways in cancer. Bioinformatics 2012;28(22):2940-2947.
(此全文限內部瀏覽)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *