帳號:guest(3.143.237.89)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):莊勝翔
作者(外文):JHUANG, SHENG-SIANG
論文名稱(中文):根據居中配對模式解決scaffolding之研究
論文名稱(外文):The Study of Solving Scaffolding Problem Based on Intermediate-matching Model
指導教授(中文):盧錦隆
指導教授(外文):Lu, Chin-Lung
口試委員(中文):林苕吟
邱顯泰
口試委員(外文):Lin, Tiao-Yin
Chiu, Hsien-Tai
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:106062616
出版年(民國):108
畢業學年度:107
語文別:中文
論文頁數:76
中文關鍵詞:演算法基因體組裝居中配對模式整數線性規劃生物資訊次世代定序
外文關鍵詞:algorithmscaffolding problemintermediate-matching modelinteger linear programmingbioinformaticsnext generation sequencing
相關次數:
  • 推薦推薦:0
  • 點閱點閱:65
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
Reference-based scaffolding是要根據一個參考基因體 (reference genome) 來決定目標基因體 (target genome) 內contigs的次序與方向。它對於獲得一個物種更完整的genome是重要且有幫助的。在本論文中,我們利用intermediate-matching breakpoint distance的概念來定義一個IBD-based scaffolding problem,這個問題的目的是要去決定出一個target genome和一個reference genome的scaffolds,使得這兩個scaffolds之間的intermediate-matching breakpoint distance要最小。在此問題中,target genomes和reference genome被表示成一些可以重複的sequence markers。在本研究中,我們設計了一個integer linear programming的方法來解決 IBD-based scaffolding problem。最後,我們在模擬與真實數據的實驗結果都顯示出我們的IBD-based scaffolding演算法在有考慮duplicate markers時的準確度比它在沒有考慮duplicate markers時的準確度還來的好。除此之外,在這個研究我們用來測試的scaffolding演算法中,IBD-based scaffolding演算法有比較好的準確度表現,但是它比其他scaffolding的演算法還需要更多的執行時間來完成它的scaffolding。
Reference-based scaffolding is to determine the order and orientation of contigs in a target genome based on a reference genome. It is important and helpful to obtain a more complete genome sequence of a species. In this thesis, we utilize intermediate-matching breakpoint distance (IBD) to define an IBD-based scaffolding problem, which is to determine the scaffolds of a target genome and a reference genome such that the intermediate-matching breakpoint distance between the resulting scaffolds is minimized. In this problem, the target and reference genomes are represented in terms of sequence markers that can be duplicate. In this study, we design an integer linear programming approach to solve the IBD-based scaffolding problem. Finally, our experimental results on simulated and real datasets have shown that the accuracy performance of our IBD-based scaffolding algorithm with considering duplicate markers is better than that of our IBD-based scaffolding algorithm without considering duplicate markers. In addition, our IBD-based scaffolding algorithm has better accuracy performance among the scaffolding algorithms we tested in this study, but it requires more running time to finish its scaffolding than the others.
中文摘要...................................................1
Abstract..................................................2
Contents..................................................3
List of figures...........................................5
List of tables...........................................10
Chapter 1 Introduction...................................12
Chapter 2 Methods........................................17
2.1 Preliminaries........................................18
2.1.1 Genome, Contig and Marker..........................18
2.1.2 Adjacency, Shared Adjacency and Breakpoint.........20
2.1.3 Shared Potential Adjacency.........................22
2.1.4 Extended Shared Potential Adjacency................23
2.1.5 Matching and Intermediate-matching.................24
2.2 ILP formulation......................................25
2.2.1 ILP Variables and Objective Function...............25
2.2.2 ILP constraints....................................27
Chapter 3 Experiment Results and Discussion..............35
3.1 Quality Metrics......................................35
3.2 Experiments of Simulation............................37
3.2.1 Flowchart of Simulation............................38
3.2.2 Setting of Simulation..............................39
3.2.3 Results of Simulation..............................40
3.3 Experiments of Real Datasets.........................53
3.3.1 Testing Datasets...................................53
3.3.2 Results of Real Datasets...........................57
3.3.3 Simulation and Real data...........................70
Chapter 4 Conclusion....................................73
Reference................................................74
[1] Assefa, S., Keane, T. M., Otto, T. D., Newbold, C., & Berriman, M. (2009). ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics (Oxford, England), 25(15), 1968–1969. doi:10.1093/bioinformatics/btp347
[2] Galardini, M., Biondi, E. G., Bazzicalupo, M., & Mengoni, A. (2011). CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes. Source code for biology and medicine, 6, 11. doi:10.1186/1751-0473-6-11
[3] Husemann, P., & Stoye, J. (2010). r2cat: synteny plots and comparative assembly. Bioinformatics (Oxford, England), 26(4), 570–571. doi:10.1093/bioinformatics/btp690
[4] Daniel C. Richter, Stephan C. Schuster, Daniel H. Huson. (2007) OSLay: optimal syntenic layout of unfinished assemblies, Bioinformatics, 23(13),1573–1579, https://doi.org/10.1093/bioinformatics/btm153
[5] Rissman, A. I., Mau, B., Biehl, B. S., Darling, A. E., Glasner, J. D., & Perna, N. T. (2009). Reordering contigs of draft genomes using the Mauve aligner. Bioinformatics (Oxford, England), 25(16), 2071–2073. doi:10.1093/bioinformatics/btp356
[6] van Hijum, S. A., Zomer, A. L., Kuipers, O. P., & Kok, J. (2005). Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies. Nucleic acids research, 33(Web Server issue), W560–W566. doi:10.1093/nar/gki356

[7] Muñoz, A., Zheng, C., Zhu, Q., Albert, V. A., Rounsley, S., & Sankoff, D. (2010). Scaffold filling, contig fusion and comparative gene order inference. BMC bioinformatics, 11, 304. doi:10.1186/1471-2105-11-304
[8] Dias, Z., Dias, U., & Setubal, J. C. (2012). SIS: a program to generate draft genome sequence scaffolds for prokaryotes. BMC bioinformatics, 13, 96. doi:10.1186/1471-2105-13-96
[9] Lu, C. L., Chen, K. T., Huang, S. Y., & Chiu, H. T. (2014). CAR: contig assembly of prokaryotic draft genomes using rearrangements. BMC bioinformatics, 15(1), 381. doi:10.1186/s12859-014-0381-3
[10] Chen K. T., Liu C. L., Huang S. H., Shen H. T., Shieh Y. K., Chiu H. T., Lu C. L. (2018) CSAR: a contig scaffolding tool using algebraic rearrangements, Bioinformatics, 34(1), 109–111, https://doi.org/10.1093/bioinformatics/btx543
[11] A Bailey, Jeffrey & E Eichler, Evan. (2006). Primate segmental duplications: Crucibles of evolution, diversity and disease. Nature reviews. Genetics. 7. 552-64. 10.1038/nrg1895.
[12] M. Lynch. (2007) The Origins of Genome Architecture. Sinauer, Sunderland, MA.
[13] D. Sankoff. (1999). Genome rearrangement with gene families. Bioinformatics 15(11), 909-917.
[14] G. Blin, C. Chauve, G. Fertin. (2004) The breakpoint distance for signed sequences. In: Proceedings of the 1st Conference on Algorithms and Computational Methods for Biochemical and Evolutionary Networks (CompBioNets), vol. 3, pp. 3–16

[15] S. Angibaud, G. Fertin, I. Rusu, A. Thévenin, S. Vialette. (2007) A pseudo-boolean programming approach for computing the breakpoint distance between two genomes with duplicate genes. In: Tesler, G., Durand, D. (eds.) RECMOB-CG 2007. LNCS (LNBI), vol. 4751, pp. 16–29. Springer, Heidelberg
[16] M. Shao, B.M.E. Moret. (2015) A fast and exact algorithm for the exemplar breakpoint distance. In: Przytycka, T.M. (ed.) RECOMB 2015. LNCS, vol. 9029, pp. 309–322. Springer, Heidelberg
[17] M. Shao, B.M.E. Moret. (2016) On Computing Breakpoint Distances for Genomes with Duplicate Genes. In: Singh M. (eds) Research in Computational Molecular Biology. RECOMB 2016. Lecture Notes in Computer Science, vol 9649. Springer, Cham
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *