帳號:guest(3.14.129.106)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):盧翰均
作者(外文):Lu, Han-Jyun
論文名稱(中文):貝氏模型運用於RNA測序資料的群集與特徵選取
論文名稱(外文):Bayesian sparse negative binomial model for clustering RNA-seq data
指導教授(中文):曾建城
謝文萍
指導教授(外文):Tseng, George C
Hsieh, Wen-Ping
口試委員(中文):盧鴻興
張中
口試委員(外文):Lu, Horng-Shing
Chang, Chung
學位類別:碩士
校院名稱:國立清華大學
系所名稱:統計學研究所
學號:106024502
出版年(民國):108
畢業學年度:107
語文別:中文
論文頁數:39
中文關鍵詞:貝氏分析RNA測序資料群集分析特徵選取
外文關鍵詞:Bayesian analysisRNA sequenceClustering analysisFeature selection
相關次數:
  • 推薦推薦:0
  • 點閱點閱:73
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
RNA測序(RNA-seq)逐漸成為生物學實驗中的一種流行方法。 RNA-seq在基因表達中呈現離散型資料。對於RNA-seq數據來說,在進行聚類分析同時做到特徵選取是一項具有挑戰性但重要的工作。現有方法中存在基於Gaussian假設,和sparse K-means可做特徵選取與聚類分析,但這僅限於連續型資料。當我們將離散資料轉換為連續資料時,將會丟失訊息。
在論文中,我們開發了RNA-seq數據的負二項模型用來聚類擁有高維基因特徵(p大)的樣本(n小)。我們使用貝氏推理並在我們的方法中利用馬爾可夫鏈蒙地卡羅(MCMC)估計參數。
將該方法與sparse Gaussian clustering model, sparse K-means和sparse negative binomial model-based clustering的聚類進行比較,我們使用老鼠腦部研究數據生成我們的模擬資料。
結果表明,所提出的計數數據模型在聚類準確度和特徵選取方面對sparse Gaussian clustering model和sparse K-means具有更好的性能。我們的方法與sparse negative binomial model-based clustering呈現相似的結果。
RNA sequencing (RNA-seq) becomes a popular method in biological experiment. RNA-seq presents discrete count data in gene expression. Clustering with feature selection is a challenging but important work for RNA-seq data. There are existing method base on Gaussian assumption and sparse K-means provide solutions to continuous data. However, there will be loss of information when we transform the discrete data to continuous data.
In this thesis, we develop a negative binomial model for RNA-seq data to cluster samples (small n) with high-dimensional gene features (large p). We use Bayesian inference and simulate parameters in Markov Chain Monte Carlo (MCMC) on our method.
The method is compared with sparse Gaussian clustering model, sparse K-means and sparse negative binomial model-based clustering using rat brain studies data to generate our simulation data.
The result shows better performance of the proposed count data model in clustering accuracy and feature selection to sparse Gaussian clustering model and sparse K-means. Our method presents similar result with sparse negative binomial model-based clustering.
1. Introduction---------------------------------------------2
1.1 Ribonucleic acid (RNA) sequencing technology------------2
1.2 Existing methods----------------------------------------5
1.3 Proposed method-----------------------------------------11
2. Bayesian sparse negative binomial model for clustering--12
2.1 Notation and assumptions-----------------------------12
2.2 Generative model-------------------------------------12
2.3 Formulation------------------------------------------14
2.4 Simulating posterior distribution via Markov chain Monte Carlo-------------------------------------------------------16
3. Simulation----------------------------------------------23
4. Discussion and conclusion-------------------------------37
5. Reference------------------------------------------------39
Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and
density estimation. Journal of the American statistical Association 97, 611-631.
Pan, W. and Shen, X. (2007). Penalized model-based clustering with application to
variable selection. Journal of Machine Learning Research 8, 1145-1164.
Witten, D. M. and Tibshirani, R. (2010). A framework for feature selection in
clustering. Journal of the American Statistical Association 105, 713-726.
Ma, T., Liang F. and Tseng G. C. (2017). Biomarker detection and categorization in
ribonucleic acid sequencing meta-analysis using Bayesian hierarchical models.
Journal of the Royal Statistical Society Series C Appl. Stat. 2017 Aug;66(4):847-
867
Rahman, T., Ma, T. and Tseng G. C. (Unpublished paper) Sparse negative binomial
model-based clustering for RNA-seq count data.
Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson , Aki Vehtari, Donald B.
Rubin. Bayesian Data Analysis (Chapman & Hall/CRC Texts in Statistical Science)
3rd Edition.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top

相關論文

無相關論文
 
* *