帳號:guest(3.15.228.0)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):沈信廷
作者(外文):Shen, Hsin Ting
論文名稱(中文):Multi-CSAR: 一個基於代數重組距離並利用多個參考基因體的Contig Scaffolding工具
論文名稱(外文):Multi-CSAR: A Cnotig Scaffolding Tool Using Multiple Reference Genomes Based on Algebraic Rearrangement Distance
指導教授(中文):盧錦隆
指導教授(外文):Lu, Chin Lung
口試委員(中文):唐傳義
邱顯泰
口試委員(外文):Tang, Chuan Yi
Chiu, Hsien Tai
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:103062613
出版年(民國):105
畢業學年度:104
語文別:中文
論文頁數:29
中文關鍵詞:次世代定序多個參考基因體代數重組距離
外文關鍵詞:scaffoldingcontigNGSmultiple referencealgebraic Rearrangement Distance
相關次數:
  • 推薦推薦:0
  • 點閱點閱:71
  • 評分評分:*****
  • 下載下載:5
  • 收藏收藏:0
次世代定序(Next Generation Sequencing,簡稱NGS)技術已經允許我們對許多有興趣的物種有效率地產生出他們的基因體草圖(draft genome)。然而,這些基因體草圖都還只是一群獨立的DNA片段(contig),他們之間的次序與方向是未知的。Scaffolding是用來決定這些contig的次序與方向的程序,一個準確的scaffolding程序對於要取得一個完整定序的物種基因體是非常有幫助的。目前有許多的軟體工具,已經被發展出來可以利用一個參考基因體去決定目標基因體草圖上contig的前後次序與方向,但許多這類的工具在使用上都有一個限制,他們所參考的基因體都必須是完整的。這會使得這些工具在實用上的可用性降低,因為在現有的資料庫中,基因體草圖的可取得性遠遠超過完整的基因體。現在已經存在許多工具被設計來解決這個問題,其中我們實驗室所所提出的CSAR就是一個可以透過參考一個基因體草圖來決定目標基因體中contig的次序及方向。儘管如此,如果目標基因體與參考基因體之間存在著重組距離(rearrangement distance)時,利用單一參考基因體的軟體工具有可能會產生錯誤的scaffolds。現在有兩個scaffolding工具Ragout和MeDuSa,他們都可以同時使用一個或多個完整的基因體或基因體草圖來做scaffolding,其中Ragout會需要使用者提供一個記錄著所有基因體之間演化關係的演化樹,而一般來說使用者並不是這麼容易能夠事先取得這個演化樹。在本研究中,我們提出一個scaffolding工具叫Multi-CSAR,它能夠利用多個完整或基因體草圖的參考基因體去產生高品質的scaffolds。像MeDuSa一樣,Multi-CSAR不需要事先知道目標基因體和所有參考基因體之間的演化關係。除此之外,不同於Ragout和MeDuSa,兩者皆要嘗試去解決NP-hard的問題,Multi-CSAR的演算法只需解多項式時間的問題。最後我們用真實和模擬的資料來測試Multi-CSAR並將其結果與Ragout和MeDuSa的結果做比較。我們的實驗結果顯示出我們的Multi-CSAR確實在許多的衡量標準下,如敏感度、準確度、F-score、基因體覆蓋率、scaffold數和scaffold N50大小,皆勝出Ragout和MeDuSa。
Next generation sequencing technologies have allowed us to efficiently produce draft genomes for many organisms of interest. However, most draft genomes are just col-lections of independent contigs, whose relative positions and orientations along the genome being sequenced are unknown. Scaffolding is a process to determine the or-ders and orientations of these contigs, which is critical and helpful for accomplishing the subsequent finishing process. Currently, several tools have been developed to or-der and orient the contigs of draft genomes using single reference genome. However, most of these tools can apply only on a complete reference genome. This may reduce the usability in practice since, the availability of draft genomes greatly exceeds that of completely sequenced ones in current public databases. Several tools have been developed to address this problem, including CSAR proposed by our laboratory that can order and orient the contigs in target genome by using a draft reference genome. However, all these single-reference based tools may produce erroneous scaffolds of draft genomes if there is rearrangement distance between the target genome and the reference genome. Ragout and MeDuSa are two recently developed scaffolding tools using multiple complete or draft reference genomes. Note that Ragout requires the user to input a phylogenetic tree of the target and reference genomes, which actually cannot be easily obtained by the user in advance. In this study, we present a scaf-folding tool called Multi-CSAR that can utilize multiple complete or draft reference genomes to produce high-quality scaffolds of draft genomes. Like MeDuSa, our Multi-CSAR does not require prior knowledge on the evolutionary relationships among the target and reference genomes. Moreover, in contrast to Ragout and Me-DuSa, both attempting to solve NP-hard problem, the algorithm behind our Mul-ti-CSAR involves only polynomial time solvable problems. Finally, we have tested our Multi-CSAR on the real and simulated datasets and compared its results with those obtained by Ragout and MeDuSa. Our experimental results have shown that Multi-CSAR indeed outperforms Ragout and MeDuSa in terms of many metrics such as sensitivity, precision, F-score, genome coverage, scaffold number and scaffold N50 size.
中文摘要 I
Abstract III
Acknowledgement V
Contents VI
List of figures VIII
List of tables IX
Chapter 1 Introduction 1
Chapter 2 Implementation 4
2.1 Overview of CSAR 4
2.2 Method of Multi-CSAR 5
Chapter 3 Results and Discussion 8
3.1 Datasets and setting of parameters 8
3.2 Evaluation 9
3.3 Experimental results 10
3.3.1 Testing on real datasets 10
3.3.2 Testing on simulated dataset 14
3.3.3 Comparison on running time 25
Chapter 4 Conclusion 27
References 28
[1] Reddy TBK, Thomas AD, Stamatis D, Bertsch J, Isbandi M, Jansson J, Mallajosyula J, Pagani I, Lobos EA, Kyrpides NC: The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification. Nucleic Acids Research 2015, 43:D1099–D1106.
[2] Pop M: Genome assembly reborn: recent computational challenges. Briefings in Bioin-formatics 2009, 10:354–366.
[3] Nagarajan N, Cook C, Di Bonaventura M, Ge H, Richards A, Bishop-Lilly KA, De- Salle R, Read TD, Pop M: Finishing genomes with limited resources: lessons from an en-semble of microbial genomes. BMC Genomics 2010, 11:242.
[4] Rissman AI, Mau B, Biehl BS, Darling AE, Glasner JD, Perna NT: Reordering contigs of draft genomes using the Mauve Aligner. Bioinformatics 2009, 25:2071–2073.
[5] Richter DC, Schuster SC, Huson DH: OSLay: optimal syntenic layout of unfinished as-semblies. Bioinformatics 2007, 23:1573–1579.
[6] Lu CL: An efficient algorithm for contig ordering problem under algebraic rearrangement distance. Journal of Computational Biology, 22 (2015) 975-987.
[7] Kolmogorov M, Raney B, Paten B, Pham S: Ragout-a reference-assisted assembly tool for bacterial genomes. Bioinformatics 2014, 30:i302–i309.
[8] Bosi E, Donati B, Galardini M, Brunetti S, Sagot MF, Lio P, Crescenzi P, Fani R, Fondi M: MeDuSa: a multi-draft based scaffolder. Bioinformatics 2015, 31:2443–2451.
[9] Minkin I, Patel A, Kolmogorov M, Vyahhi N and Pham S. (2013) Sibelia: A scalable and comprehensive synteny block generation tool for closely related microbial genomes. Lec-ture Notes in Computer Science, 8126, 215-229.
[10] Feijão, P. and Meidanis, J. 2013. Extending the algebraic formalism for genome rear-rangements to include linear chromosomes. IEEE-ACM Transactions on Computational Biology and Bioinformatics 10, 819-831
[11] Lu CL, Chen KT, Huang SY, Chiu HT: CAR: contig assembly of prokaryotic draft ge-nomes using rearrangements. BMC Bioinformatics 2014, 15:381.
[12] Bergeron, A., Mixtacki, J. and Stoye, J. 2006. A unifying view of genome rearrange-ments. Lecture Notes in Computer Science 4175, 163-173
[13] Kolmogorov V: Blossom V: a new implementation of a minimum cost perfect matching algorithm. Mathematical Programming Computation 2009, 1:43–67.
[14] Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL: Versatile and open software for comparing large genomes. Genome Biology 2004, 5:R12.
[15] Dias Z, Dias U, Setubal JC: SIS: a program to generate draft genome sequence scaffolds for prokaryotes. BMC Bioinformatics 2012, 13:96.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *