帳號:guest(3.133.113.72)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):游堯中
作者(外文):Yu, Yao-Chung
論文名稱(中文):根據保留區間距離解決Scaffolding問題之研究
論文名稱(外文):The Study of Solving Scaffolding Problem Based on Conserved Interval Distance
指導教授(中文):盧錦隆
指導教授(外文):Lu, Chin-Lung
口試委員(中文):邱顯泰
林苕吟
口試委員(外文):Chiu, Hsien-Tai
Lin, Tiao-Yin
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:107065533
出版年(民國):110
畢業學年度:109
語文別:中文
論文頁數:49
中文關鍵詞:保留區間
外文關鍵詞:scaffoldingconserved interval
相關次數:
  • 推薦推薦:0
  • 點閱點閱:75
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
Scaffolding是DNA定序的過程中非常重要的一個步驟,其目的是把一個基因體草圖中的contigs給定序與定向。先前,我們的實驗室已開發出一個maximum matching breakpoint distance (簡稱MBD) based scaffolding演算法,它可以利用一個參考基因體來對目標基因體草圖進行scaffolding。然而,breakpoint只考慮相鄰的兩個markers,結果造成當參考與目標基因體的親屬關係較遠時,MBD-based scaffolding演算法的resulting scaffolds會接得較不完整。因此,在本論文中,我們利用conserved interval的概念,定義出一個maximum matching conserved interval distance (簡稱MCID) based scaffolding問題,這個問題的目的是要去決定出目標與參考基因體的scaffolds,使得這兩個scaffolds之間的conserved interval distance為最小。我們使用整數線性規劃設計出一個精確演算法來解決MCID-based scaffolding問題。最後,根據模擬的實驗結果,我們的MCID-based scaffolding演算法在參考基因體是完整的情況下的靈敏度比MBD-based scaffolding演算法來得好。即便我們的MCID-based scaffolding演算法的準確度不如MBD-based scaffolding演算法,但在超過一半的參數組合下,我們的MCID-based scaffolding演算法在F-score的表現仍勝過MBD-based scaffolding演算法。
Scaffolding is one of the important steps in the process of DNA sequencing. The purpose of scaffolding is to order and orient contigs in a draft genome. Previously, our laboratory has developed a maximum matching breakpoint distance (MBD for short) based scaffolding algorithm, to scaffold a target draft genome using a reference genome. However, a breakpoint only considers two adjacent markers, resulting in that the more dissimilar the reference and target genomes are, the less complete scaffolds the MBD-based scaffolding algorithm makes. In this thesis, therefore, we utilize a concept of conserved intervals to define a maximum matching conserved interval distance (MCID for short) based scaffolding problem, which is to determine the scaffolds of the target and reference genomes such that the conserved interval distance between the resulting scaffolds is minimized. In addition, we use integer linear programming (ILP) to design an exact algorithm to solve the MCID-based scaffolding problem. Finally, according to the experimental results on simulated datasets, the sensitivity of our MCID-based scaffolding algorithm is better than that of MBD-based one when the reference genome is complete. Although the precision of our MCID-based scaffolding algorithm is inferior to that of MBD-based one, our MCID-based scaffolding algorithm still prevails over the MBD-based one in terms of F-score in more than half of all parameter combinations.
中文摘要 1
Abstract 2
Acknowledgement 3
Contents 4
List of figures 6
List of tables 11
Chapter 1 Introduction 12
Chapter 2 Methods 17
2.1 Preliminaries 17
2.1.1 Genome, Contig and Marker 17
2.1.2 Interval, Conserved Interval and Conserved Interval Distance 18
2.1.3 Matching, Maximum-Matching and Maximum- Matching Model 19
2.1.4 Potential Conserved Intervals 21
2.1.5 Extended Intervals and Extended Potential Conserved Intervals 23
2.2 ILP Formulation 26
2.2.1 ILP Variables and Objective Function 26
2.2.2 ILP Constraints 27
Chapter 3 Experiment Results and Discussion 35
3.1 Quality Metrics 35
3.2 Experiments of Simulation 36
3.2.1 Flowchart of Simulation 37
3.2.2 Parameters of Simulation 38
3.2.3 Family Ratios of Simulation 39
3.2.4 Results of Simulation 41
Chapter 4 Conclusion 47
References 48
[1] S. Assefa, T.M. Keane, T.D. Otto, C. Newbold and M. Berriman (2009) ABACAS algorithm-based automatic contiguation of assembled sequences. Bioinformatics, 25, 1968–1969.
[2] M. Galardini, E.G. Biondi, M. Bazzicalupo and A. Mengoni (2011) CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes. Source Code for Biology and Medicine, 6, 11.
[3] P. Husemann and J. Stoye (2010) r2cat: synteny plots and comparative assembly. Bioinformatics, 26, 570–571.
[4] D.C. Richter, S.C. Schuster and D.H. Huson (2007) OSLay: optimal syntenic layout of unfinished assemblies. Bioinformatics, 23, 1573–1579.
[5] A.I. Rissman, B. Mau, B.S. Biehl, A.E. Darling, J.D. Glasner and N.T. Perna (2009) Reordering contigs of draft genomes using the Mauve Aligner. Bioinformatics, 25, 2071–2073.
[6] S.A. van Hijum, A.L. Zomer, O.P. Kuipers and J. Kok (2005) Projector 2 contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies. Nucleic Acids Research, 33, 560–566.
[7] S.W. Hsu (2020) An improved algorithm for solving scaffolding problem based on maximum-matching model. Master thesis, National Tsing Hua University, Taiwan.
[8] Guillaume Blin, Guillaume Fertin and Cedric Chauve. (2004) The breakpoint distance for signed sequences. 1st Conference on Algorithms and Computational Methods for biochemical and Evolutionary Networks , Recife, Brazil. pp.3-16.
[9] M. Shao and B. Moret (2016) A fast and exact algorithm for the exemplar breakpoint distance. Journal of Computational Biology, 23, 337–346.
[10] M.Shao and B. Moret (2017) On computing breakpoint distances for genomes with duplicate genes. Journalof Computational Biology, 24, 571–580.
[11] Bergeron A. and Stoye J. (2006) On the similarity of sets of permutations and its applications to genome comparison. Journal of Computational Biology, 13, 1340-1354.
[12] I.H. Kao (2019) The study of solving scaffolding problem based on maximum-matching model. Master thesis, National Tsing Hua University, Taiwan.
[13] Angibaud S., Fertin G., Rusu I. and Vialette S. (2006) How pseudo-boolean programming can help genome rearrangement distance computation. Proc. RECOMB 2006 Int’l Workshop Comparative Genomics (RCG ‘06), pp. 75-86, 2006.
(此全文20260912後開放外部瀏覽)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *