帳號:guest(          離開系統
字體大小: 字級放大   字級縮小   預設字形  


作者(外文):Lee, Hsi-Yen
論文名稱(外文):Iterative clustering of gene expression data in search of subgroups of general population
指導教授(外文):Hsieh, Wen-Ping
口試委員(外文):Chung, Ren-Hua
Chang, Sheng-Mao
外文關鍵詞:iterative clusteringbatch effectblood gene expressiondsparse k-meansBUS model
  • 推薦推薦:0
  • 點閱點閱:99
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
為了從基因表現的訊號中瞭解人類體質的自然分群結構,我們提出了迭代分群法,利用兩種分群方法,Batch effects correction with Unknown Subtypes(BUS)和sparse k-means,採取迭代的演算法,不斷的將基因表現量中不同的分群結構挖掘出來,藉此找出有關體質的分群以及重要的基因。我們蒐集大量的血液基因表現資料,進行大規模的探索性分析,最後利用gene ontology analysis解釋這些基因在生物體中扮演的角色。
The gene expression matrix has been applied to a variety of problems by scientists, including the use of tapping into the new features of disease and the disassembly of cellular components. For the same disease, it often has different reactions in different people. The human constitution plays multiple roles in the disease, which not only affects the diagnosis results, but also causes great difference on the results after treatment.
In order to understand the natural grouping structure of human constitution from the signal of gene expression, we proposed an iterative clustering method based on two methods of clustering, Sparse k-means and BUS, to detect unknown subtypes. We can detect different clustering structures in the gene expression and select feature genes for each of the structure at the same time. We collected a large amount of blood gene expression data, conducted a large-scale exploratory analysis, and finally use gene ontology analysis to explain the role of these genes.
1.簡介 1
2.方法 4
2.1 k-means和sparse k-means 4
2.2 Batch effects correction with Unknown-Subtypes 8
3. 結果 11
3.1 模擬資料 11
3.1.1 sparse k-means在模擬資料中的表現 11
3.1.2 BUS在模擬資料中的表現 15
3.2 真實資料分析 21
3.2.1 真實資料介紹 21
3.2.2 生物晶片資料前處理 22
3.2.3 利用sparse k-means各自分析乳癌實驗資料和肺癌實驗資料 23
3.2.4 利用BUS模型分析乳癌實驗資料和肺癌實驗資料 27
3.2.5 利用BUS模型分析乳癌實驗資料、肺癌實驗資料、疲勞性疾病實驗資料、外周動脈疾病實驗資料和糖尿病實驗資料 34
3.2.6 gene ontology analysis 40
4. 結論 48
Reference 49
1. Tusher, V. G., et al. (2001). "Significance analysis of microarrays applied to the ionizing radiation response." Proceedings of the National Academy of Sciences 98(9): 5116-5121.
2. Spang, R. (2003). "Diagnostic signatures from microarrays: a bioinformatics concept for personalized medicine." Biosilico 1(2): 64-68.
3. Eisen, M. B., et al. (1998). "Cluster analysis and display of genome-wide expression patterns." Proceedings of the National Academy of Sciences 95(25): 14863-14868.
4. Alizadeh, A. A., et al. (2000). "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling." Nature 403(6769): 503.
5. Gruźdź, A., et al. (2006). "Interactive gene clustering—a case study of breast cancer microarray data." Information Systems Frontiers 8(1): 21-27.
6. Raza, K. (2014). "Clustering analysis of cancerous microarray data." Journal of Chemical and Pharmaceutical Research 6(9): 488-493.
7. D'haeseleer, P. (2005). "How does gene expression clustering work?" Nature biotechnology 23(12): 1499.
8. Kondo, Y., et al. (2012). "A robust and sparse K-means clustering algorithm." arXiv preprint arXiv:1201.6082.
9. Witten, D. M. and R. Tibshirani (2010). "A framework for feature selection in clustering." Journal of the American Statistical Association 105(490): 713-726.
10. Johnson, W. E., et al. (2007). "Adjusting batch effects in microarray expression data using empirical Bayes methods." Biostatistics 8(1): 118-127.
11. Leek, J. T. and J. D. Storey (2007). "Capturing heterogeneity in gene expression studies by surrogate variable analysis." PLoS genetics 3(9): e161.
12. Luo, Xiangyu, and Yingying Wei. "Batch effects correction with unknown subtypes". Journal of the American Statistical Association. Accepted.
13. Tibshirani, R., et al. (2001). "Estimating the number of clusters in a data set via the gap statistic." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(2): 411-423.
14. George, Edward I., and Robert E. McCulloch (1993). "Variable selection via Gibbs sampling." Journal of the American Statistical Association  88.423: 881-889.
15. Bolstad, B. M., et al. (2003). "A comparison of normalization methods for high density oligonucleotide array data based on variance and bias." Bioinformatics 19(2): 185-193.
16. LaBreche, H. G., et al. (2011). "Integrating factor analysis and a transgenic mouse model to reveal a peripheral blood predictor of breast tumors." BMC medical genomics 4(1): 61.
17. Rotunno, M., et al. (2011). "A gene expression signature from peripheral whole blood for stage I lung adenocarcinoma." Cancer prevention research.
18. Byrnes, A., et al. (2009). "Gene expression in peripheral blood leukocytes in monozygotic twins discordant for chronic fatigue: no evidence of a biomarker." PLoS One 4(6): e5805.
19. Masud, R., et al. (2012). "Gene expression profiling of peripheral blood mononuclear cells in the setting of peripheral arterial disease." Journal of clinical bioinformatics 2(1): 6.
20. Yang, M., et al. (2015). "Decreased mi R‐146 expression in peripheral blood mononuclear cells is correlated with ongoing islet autoimmunity in type 1 diabetes patients 1 型糖尿病患者外周血单个核细胞 miR‐146 表达下调与胰岛持续免疫失衡相关." Journal of diabetes 7(2): 158-165.
21. http://cpdb.molgen.mpg.de/
第一頁 上一頁 下一頁 最後一頁 top
* *