帳號:guest(216.73.216.88)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳聖豫
作者(外文):Chen, Sheng-Yu
論文名稱(中文):根據雙切斷並重接距離解決 Scaffolding 問題之研究
論文名稱(外文):The Study of Solving Scaffolding Problem Based on Double-cut-and-join Distance
指導教授(中文):盧錦隆
指導教授(外文):Lu, Chin-Lung
口試委員(中文):邱顯泰
林苕吟
口試委員(外文):Chiu, Hsien-Tai
Lin, Tiao-Yin
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:108062623
出版年(民國):110
畢業學年度:109
語文別:中文
論文頁數:46
中文關鍵詞:雙切斷並重接演算法基因體組裝最大配對模式整數線性規劃次世代定序
外文關鍵詞:double-cut-and-joinalgorithmscaffoldingmaximum matchinginteger linear programmingnext generation sequencing
相關次數:
  • 推薦推薦:0
  • 點閱點閱:379
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
Scaffolding是基因體定序的一個重要步驟,它的目的是要去決定出由一個短片段透過 assembler 所產生的 contigs 之間的次序與方向。先前,我們的實驗室已提出了數種整數線性規劃 (integer linear programming,簡稱 ILP) 的 scaffolding 演算法,這些演算法皆是考量斷點距離 (breakpoint distance,簡稱 BD) 的重組式演算法,而且他們能夠處理含有 duplicate markers 的基因體。在生物的演化中,複製與反轉是兩個重要且常見的突變。然而,基於斷點距離的 scaffolding 演算法似乎很難處理反轉。因此,在這篇研究論文中,我們提出一個基於雙切斷並重接距離 (double-cut-and-join distance) 的 ILP 演算法來解決 scaffolding 問題。雙切斷並重接 (簡稱 DCJ) 是一個可以模擬多種重組的模型,包含反轉、移位、融合、分裂和易位。模擬資料集的實驗結果顯示出我們的 DCJ-based scaffolder 的敏感度在多數情況下皆高於 BD-based 的 scaffolder。儘管它的準確度略低於 BD-based 的 scaffolder,我們 DCJ-based scaffolder的 F-score 在多數的情況下也是勝過BD-based scaffolder。然而,我們 DCJ-based scaffolder 的執行時間要比 BD-based scaffolder 長了許多,這導致我們目前的 DCJ-based scaffolder 仍難以應用到真實的資料集中。
Scaffolding is a crucial step of genome sequencing, which is to determine the ordering and orientation of contigs generated by an assembler. Previously, our laboratory has proposed several scaffolding algorithms based on integer linear programming (ILP). These algorithms are rearrangement-based algorithms that take breakpoint distance (BD) into consideration. In addition, they are able to deal with genomes with duplicate markers. In biological evolution, both duplications and inversions are important and common mutations. However, the scaffolding algorithms based on breakpoint distance seem hard to deal with inversions. Therefore, in this thesis, we propose an algorithm based on double-cut-and-join distance to solve the scaffolding problem. Double-cut-and-join (DCJ) is a model that can simulate multiple rearrangements, including inversion, transposition, fusion, fission and translocation. The experimental results of simulated datasets show that the sensitivity of our DCJ-based scaffolder is higher than that of the BD-based scaffolder in most cases. Although its accuracy is slightly lower than that of the BD-based scaffolder, the F-score of our DCJ-based scaffolder is also better than that of the BD-based scaffolder in most cases. However, the execution time of our DCJ-based scaffolder is much longer than that of the BD-based scaffolder, resulting in that our current DCJ-based scaffolder is still more difficult to apply to real datasets.
中文摘要 1
Abstract 2
Acknowledgement 3
Content 4
List of figures 6
List of tables 9
Chapter 1 Introduction 10
Chapter 2 Problem Statement 13
2.1 Basic Definitions 13
2.2 Matching and Maximum Matching 15
2.3 Double-Cut-and-Join and Its Distance 15
2.4 Maximum Matching Double-Cut-and-Join Distance 16
2.5 Scaffolding Problem Based on MDCJD Model 17
Chapter 3 Method 19
3.1 Adjacency Graph 19
3.2 Capping Method 21
3.3 ILP Formulation 22
3.4 Speed-up of DCJ-Based ILP Algorithm 24
Chapter 4 Experiment Results and Discussion 30
4.1 Production Method of Simulation Dataset 30
4.2 Quality Metrics 31
4.3 Simulation Parameter 32
4.4 Simulation Results and Discussion 33
Chapter 5 Conclusion 44
References 45

[1] Zoll, J., Rahamat-Langendoen, J., Ahout, I., de Jonge, M. I., Jans, J., Huijnen, M. A., Ferwerda G., Warris, A. and Melchers, W. J. (2015). Direct multiplexed whole genome sequencing of respiratory tract samples reveals full viral genomic information. Journal of Clinical Virology, 66, 6-11.
[2] Marian, A. J. (2011). Medical DNA sequencing. Current opinion in cardiology, 26(3), 175.
[3] Roach, J. C., Glusman, G., Smit, A. F., Huff, C. D., Hubley, R., Shannon, P. T., Rown, L., Pant, K. P., Goodman, N., Bamshad, M., Shendure, J., Drmanac, R., Jorde, L. B., Hood, L., and Galas, D. J. (2010). Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science, 328(5978), 636-639.
[4] Van Dijk, E. L., Auger, H., Jaszczyszyn, Y., and Thermes, C. (2014). Ten years of next-generation sequencing technology. Trends in genetics, 30(9), 418-426.
[5] Chen, K. T., Liu, C. L., Huang, S. H., Shen, H. T., Shieh, Y. K., Chiu, H. T., and Lu, C. L. (2018). CSAR: a contig scaffolding tool using algebraic rearrangements. Bioinformatics, 34(1), 109-111.
[6] Y.H Chen (2019). The Study of Solving Scaffolding Problem Based on Exemplar Model. Thesis, National Tsing Hua University, Taiwan.
[7] I.H. Kao (2019). The Study of Solving Scaffolding Problem Based on Maximum-Matching Model. Thesis, National Tsing Hua University, Taiwan.
[8] D.Y. Peng (2020). An Improved Algorithm for Solving Scaffolding Problem Based on Exemplar Model. Thesis, National Tsing Hua University, Taiwan.
[9] S.W. Hsu (2020). An Improved Algorithm for Solving Scaffolding Problem Based on Maximum-Matching Model. Thesis, National Tsing Hua University, Taiwan.
[10] Yancopoulos, S., Attie, O., and Friedberg, R. (2005). Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics, 21(16), 3340-3346.
[11] Bergeron, A., Mixtacki, J., and Stoye, J. (2006). A unifying view of genome rearrangements. In International Workshop on Algorithms in Bioinformatics (pp. 163-173). Springer, Berlin, Heidelberg.
[12] Shao, M., Lin, Y., and Moret, B. M. (2015). An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes. Journal of Computational Biology, 22(5), 425-435.
[13] Shao, M., and Moret, B. M. (2015). Comparing genomes with rearrangements and segmental duplications. Bioinformatics, 31(12), i329-i338.
[14] Shao, M., & Lin, Y. (2012, December). Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion. BMC bioinformatics, 13(19), 1-9.
(此全文20260824後開放外部瀏覽)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *