帳號:guest(3.16.207.33)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):黃上豪
作者(外文):Huang, Shang-Hao
論文名稱(中文):利用一個參考基因體的 Paired-end Reads 來進行 Contig Scaffolding
論文名稱(外文):Contig Scaffolding Using Paired-end Reads from a Reference Genome
指導教授(中文):盧錦隆
指導教授(外文):Lu, Chin-Lung
口試委員(中文):邱顯泰
林苕吟
口試委員(外文):Chiu, Hsien-Tai
Lin, Tiao-Yin
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系所
學號:104062614
出版年(民國):106
畢業學年度:105
語文別:中文
論文頁數:36
中文關鍵詞:生物資訊次世代定序單一參考基因體
外文關鍵詞:BioinformaticsNext-generation sequencingContigsScaffoldingSingle referencePaired-end reads
相關次數:
  • 推薦推薦:0
  • 點閱點閱:100
  • 評分評分:*****
  • 下載下載:5
  • 收藏收藏:0
次世代定序 (Next Generation Sequencing,簡稱NGS) 技術已經允許我們對許多有興趣的物種有效率地產生出他們的基因體草圖 (draft genome)。然而,這些基因體草圖都還只是一群contigs (即連續的DNA片段),他們在被定序基因體上的次序與方向是未知的。Scaffolding是用來決定這些contigs的次序與方向的程序,它對於後續要取得一個定序基因體的完整序列的程序是重要且有幫助的。目前有許多scaffolding軟體工具已經被發展出來可以利用一組paired-end reads來決定出目標基因體草圖上contigs的前後次序與方向。然而,利用NGS技術來產生paired-end reads 仍然是相當耗費時間和成本的。因此,我們想是否能夠利用一完整的參考基因體而不是來自於NGS的實驗來產生paired-end reads來對基因體草圖的contigs進行定序定向的工作。為了這個目的,我們提出一個scaffolding pipeline,它會先利用一個paired-end reads的模擬器GemSIM並根據一條完整的參考基因體序列來產生paired-end reads,接著再利用一個paired-end read-based的scaffolding工具ScaffMatch來根據這組模擬的paired-end reads進行scaffolding。除此之外,我們進行了兩組實驗來驗證我們的scaffolding方法,一組是用來比較使用模擬和真實paired-end reads的scaffolding表現,另一組是用來和其他reference-based scaffolding 工具做比較。最後,我們的實驗結果顯示出我們使用模擬paired-end reads的scaffolding表現可以與使用真實paired-end reads的結果以及大多數reference-based tool的結果相當接近。
Next generation sequencing (NGS) technologies have allowed us to efficiently produce draft genomes for many organisms of interest. However, most draft genomes are just collections of contigs, whose relative positions and orientations along the genome being sequenced are unknown. Scaffolding is a process to determine the order and orientations of these contigs, which is critical and helpful for accomplishing the subsequent finishing process to obtain the complete sequence of the genome being sequenced. Currently, several tools have been developed to determine the order and orientations of the contigs in target draft genome by using a set of paired-end reads. However, the generation of paired-end reads using NGS technologies is still costly in terms of time and money. Thus, we wonder if we can use simulated paired-end reads generated from a reference genome, rather than from any NGS experiments, to scaffold the contigs of a draft genome. For this purpose, we propose a scaffolding pipeline in which we first use GemSIM, a paired-end reads simulator, to generate paired-end reads according to a complete reference genome and then use a scaffolder, called ScaffMatch, to scaffold the contigs of a draft genome based on the simulated paired-end reads. In addition, we perform two experiments to validate our scaffolding approach, one comparing the scaffolding performance between using simulated paired-end reads from a reference genome and using real paired-end reads from the target genome, and another comparing our scaffolding approach with other reference-based scaffolding tools. Our experimental results finally have shown that the performance of our approach using simulated paired-end reads is comparable to that using real paired-end reads, and is also comparable to those obtained by most of the reference-based tools evaluated.
中文摘要 I
Abstract III
Acknowledgement V
Contents VI
List of figures VIII
List of tables X
Chapter 1 Background 1
Chapter 2 Methods 4
2.1 Method of GemSIM 5
2.2 Method of ScaffMatch 6
2.2.1 Read preprocessing 7
2.2.2 Scaffolding graph construction 8
2.2.3 Matching scaffolds 9
2.2.4 Insertion of skipped contigs 11
Chapter 3 Results and Discussion 13
3.1 Datasets and setting of parameters 13
3.2 Evaluation 17
3.3 Experimental results 19
3.3.1 Comparison of results obtained using simulated and real paired-end reads 19
3.3.2 Comparison with Reference-based tools 21
Chapter 4 Conclusion 32
References 33

[1] Pop, M. (2009) Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics, 10, 354-366.
[2] Pop, M. and Salzberg, S. (2008) Bioinformatics challenges of new sequencing technology. Trends in Genetics, 24, 142-149.
[3] Sahlin, K., Street, N., Lundeberg, J. and Arvestad, L. (2012) Improved gap size estimation for scaffolding algorithms. Bioinformatics, 28, 2215-2222.
[4] Sahlin, K., Vezzi, F., Nystedt, B., Lundeberg, J. and Arvestad, L. (2014) BESST - efficient scaffolding of large fragmented assemblies. BMC Bioinformatics, 15, 281.
[5] Nagarajan, N., Cook, C., Di Bonaventura, M., Ge, H., Richards, A., Bishop-Lilly, K., DeSalle, R., Read, T. and Pop, M. (2010) Finishing genomes with limited resources: lessons from an ensemble of microbial genomes. BMC Genomics, 11, 242.
[6] Lu, C., Chen, K., Huang, S. and Chiu, H. (2014) CAR: contig assembly of prokaryotic draft genomes using rearrangements. BMC Bioinformatics, 15.
[7] Rissman, A., Mau, B., Biehl, B., Darling, A., Glasner, J. and Perna, N. (2009) Reordering contigs of draft genomes using the Mauve Aligner. Bioinformatics, 25, 2071-2073.
[8] Richter, D., Schuster, S. and Huson, D. (2007) OSLay: optimal syntenic layout of unfinished assemblies. Bioinformatics, 23, 1573-1579.
[9] Van Hijum, S., Zomer, A., Kuipers, O. and Kok, J. (2005) Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies. Nucleic Acids Research, 33, W560-W566.
[10] Kolmogorov, M., Raney, B., Paten, B. and Pham, S. (2014) Ragout - a reference-assisted assembly tool for bacterial genomes. Bioinformatics, 30, i302-i309.
[11] Bosi, E., Donati, B., Galardini, M., Brunetti, S., Sagot, M., Lió, P., Crescenzi, P., Fani, R. and Fondi, M. (2015) MeDuSa: a multi-draft based scaffolder. Bioinformatics, 31, 2443-2451.
[12] Mandric, I. and Zelikovsky, A. (2015) ScaffMatch: scaffolding algorithm based on maximum weight matching. Bioinformatics, 31, 2632-2638.
[13] Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y., Tang, J., Wu, G., Zhang, H., Shi, Y., Liu, Y., Yu, C., Wang, B., Lu, Y., Han, C., Cheung, D. W., Yiu, S., Peng, S., Xiaoqian, Z., Liu, G., Liao, X., Li, Y., Yang, H., Wang, J., Lam, T. and Wang, J. (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience, 1.
[14] Gao, S., Sung, W. and Nagarajan, N. (2011) Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. Journal of Computational Biology, 18, 1681-1691.
[15] Escalona, M., Rocha, S. and Posada, D. (2016) A comparison of tools for the simulation of genomic next-generation sequencing data. Nature Reviews Genetics, 17, 459-469.
[16] Goodwin, S., McPherson, J. and McCombie, W. (2016) Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics, 17, 333-351.
[17] McElroy, K., Luciani, F. and Thomas, T. (2012) GemSIM: general, error-model based simulator of next-generation sequencing data. BMC Genomics, 13, 74.
[18] Hunt, M., Newbold, C., Berriman, M. and Otto, T. (2014) A comprehensive evaluation of assembly scaffolding tools. Genome Biology, 15, R42.
[19] Langmead, B. and Salzberg, S. (2012) Fast gapped-read alignment with Bowtie 2. Nature Methods, 9, 357-359.
[20] Edmonds, J. (1965) Paths, trees, and flowers. Journal canadien de mathématiques, 17, 449-467.
[21] Gurevich, A., Saveliev, V., Vyahhi, N. and Tesler, G. (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29, 1072-1075.
[22] Zerbino, D. and Birney, E. (2008) Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18, 821-829.
[23] Salzberg, S., Phillippy, A., Zimin, A., Puiu, D., Magoc, T., Koren, S., Treangen, T., Schatz, M., Delcher, A., Roberts, M., Marcais, G., Pop, M. and Yorke, J. (2012) GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Research, 22, 557-567.
[24] Kurtz, S., Phillippy, A., Delcher, A., Smoot, M., Shumway, M., Antonescu, C. and Salzberg, S. (2004) Versatile and open software for comparing large genomes. Genome Biology, 5, R12.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *