帳號:guest(18.191.87.134)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):彭道遠
作者(外文):Peng, Dao-Yuan
論文名稱(中文):根據最少配對模式解決Scaffolding問題的改良演算法
論文名稱(外文):An Improved Algorithm for Solving Scaffolding Problem Based on Exemplar Model
指導教授(中文):盧錦隆
指導教授(外文):Lu, Chin-Lung
口試委員(中文):邱顯泰
林苕吟
林沿妊
口試委員(外文):Chiu, Hsien-Tai
Lin, Tiao-Yin
Lin, Yen-Jen
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:107062528
出版年(民國):109
畢業學年度:108
語文別:中文
論文頁數:93
中文關鍵詞:演算法基因體組裝最少配對模型整數線性規劃生物資訊次世代定序
外文關鍵詞:algorithmscaffolding problemexemplar modelinteger linear programmingbioinformaticsnext generation sequencing
相關次數:
  • 推薦推薦:0
  • 點閱點閱:559
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
目前要獲得一個物種完整的基因序列需要經過許多步驟,其中一個重要的步驟為scaffolding,其過程為以基因組草圖內的contigs形成的集合當作輸入,對這些contigs做定向定序,最後輸出它們的scaffolds。我們實驗室已對scaffolding研究了多年。在2019年,我們設計出一個整數線性規劃 (integer linear programming) scaffolding 演算法,它是rearrangement-based的方法且可以對具有重複特徵 (duplicate markers) 的兩個基因組草圖進行scaffolding。然而,由於高執行時間,這個scaffolding演算法只適用於細菌基因組且contigs數量不能太多。因此,在本論文中,我們會透過減少兩種變數與增加三條可以降低執行時間的限制式來進一步加速這個ILP scaffolding演算法。除此之外,一些的人工與實際資料實驗結果也顯示出我們改良後的ILP scaffolding 演算法,在有考慮duplicate markers情況下,其F-score、基因覆蓋率和scaffold number確實會比不考慮duplicate markers的情況還要來的好。相較於另一個rearrangement-based scaffolding工具CSAR,我們改良的ILP scaffolding演算法準確度較高,但CSAR接的比較完整且速度較快。
Currently, a complete genome sequence of a species is usually obtained by multiple steps. One of these important steps is scaffolding, which takes a set of contigs in a draft genome as input, orders and orients contigs in the set, and finally outputs the scaffolds of the draft genome. Our labatorary has been studied the scaffolding for many years. In our previous result, we designed an scaffolding algorithm based on integer linear programming (ILP) which is a rearrangement-based approach to scaffold two draft genomes with duplicate markers. However, the scaffolding algorithm can only apply to bacterial genomes with proper number of contigs because of its high running time. Therefore, in this thesis, we further speed up this ILP scaffolding algorithm by reducing two types of variables and adding three new constraints. In addition, the experimental results of artificial and real datasets have shown that our improved ILP scaffolding algorithm with considering duplicate markers indeed has better F-score, genome coverage and scaffold number than it without considering duplicate markers. In comparison with another rearrangement-based scaffolding tool CSAR, the precision of our improved ILP scaffolding algorithm is better than that of the CSAR, but CSAR outperforms our algorithm in terms of scaffold number and running time.
中文摘要.....1
Abstract.....2
Acknowledgement.....3
Content.....4
List of figures.....6
List of tables.....13
Chapter 1 Introduction.....19
Chapter 2 Methods.....22
2.1 Preliminaries.....23
2.1.1 Genome, Contig and Marker.....23
2.1.2 Exemplar.....25
2.1.3 Adjacency.....25
2.1.4 Pair of Shared Simple Adjacency.....26
2.1.5 Potential Adjacency.....27
2.1.6 Pair of Shared Simple Potential Adjacency.....28
2.1.7 Extended Potential Adjacency.....30
2.1.8 Extended Pair of Shared Simple Potential Adjacency.....31
2.1.9 Exemplar Breakpoint Distance.....33
2.2 Problem Statement.....35
2.3 ILP Formulation.....37
2.3.1 ILP Variables.....37
2.3.2 ILP Constraints.....38
2.3.3 ILP Objective Function.....44
2.4 Doubly Extended Pair of Shared Simple Potential Adjacency.....45
2.5 Circular Scaffold.....47
Chapter 3 Experiment Result and Discussion.....48
3.1 Quality Metrics.....48
3.2 Scaffolders.....50
3.3 Experiments of Artificial Datasets.....51
3.3.1 Generating Input Data to Scaffolders.....51
3.3.2 Settings of Used Tools.....52
3.3.3 Artificial Datasets.....53
3.3.4 Results of Artificial Datasets.....58
3.3.5 Discussion.....72
3.4 Experiments of Real Datasets.....74
3.4.1 Settings of Used Tools.....74
3.4.2 Real Datasets.....75
3.4.3 Results of Real Datasets.....79
3.4.4 Discussion.....88
Chapter 4 Conclusions.....90
Reference.....92
[1] Kurtz S. et al. (2004) Versatile and open software for comparing large genomes. Genome Biology, 5, R12.
[2] Minkin I., Patel A., Kolmogorov M., Vyahhi N., Pham S. (2013) Sibelia: A Scalable and Comprehensive Synteny Block Generation Tool for Closely Related Microbial Genomes. In: Darling A., Stoye J. (eds) Algorithms in Bioinformatics.
[3] C.L. Lu (2015) An efficient algorithm for the contigs ordering problem under algebraic rearrangement distance. Journal of Computational Biology, 22, 975–987.
[4] K.T. Chen, C.L. Liu, S.H. Huang, H.T. Shen, Y.K. Shieh, H.T. Chiu and C.L. Lu (2018) CSAR: a contig scaffolding tool using algebraic rearrangements, Bioinformatics, 34, 109–111.
[5] S.A. van Hijum, A.L. Zomer, O.P. Kuipers and J. Kok (2005) Projector 2 contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies. Nucleic Acids Research, 33, 560–566.
[6] D.C. Richter, S.C. Schuster and D.H. Huson (2007) OSLay: optimal syntenic layout of unfinished assemblies. Bioinformatics, 23, 1573–1579.
[7] A.I. Rissman, B. Mau, B.S. Biehl, A.E. Darling, J.D. Glasner and N.T. Perna (2009) Reordering contigs of draft genomes using the Mauve Aligner. Bioinformatics, 25, 2071–2073.
[8] J. Bailey and E. Eichler (2006) Primate segmental duplication: crucibles of evolution, diversity and disease. Nature Reviews Genetics, 7, 552–564.
[9] M. Lynch (2007) The Origins of Genome Architecture. Sinauer, Sunderland, MA.
[10] Y.H. Chen (2019) The Study of Solving Scaffolding Problem Based on Exemplar Model. Tesis, National Tsing Hua University.
[11] Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ. Ilia Minkin, Paul Medvedev bioRxiv 548123; doi: https://doi.org/10.1101/548123
[12] M. Shao and B. Moret (2016) A fast and exact algorithm for the exemplar breakpoint distance. Journal of Computational Biology, 23, 337–346.
[13] Alexey Gurevich, Vladislav Saveliev, Nikolay Vyahhi, Glenn Tesler, QUAST: quality assessment tool for genome assemblies, Bioinformatics, Volume 29, Issue 8, 15 April 2013, Pages 1072–1075.
[14] Hunt, M., Newbold, C., Berriman, M. et al. A comprehensive evaluation of assembly scaffolding tools. Genome Biology, 15, R42 (2014).
(此全文20250825後開放外部瀏覽)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *