資料載入處理中...
圖書館首頁
|
網站地圖
|
首頁
|
本站說明
|
聯絡我們
|
相關資源
|
台聯大論文系統
|
操作說明
|
English
簡易查詢
進階查詢
論文瀏覽
熱門排行
我的研究室
上傳論文
新版博碩士論文系統
建檔說明
常見問題
帳號:guest(216.73.216.146)
離開系統
字體大小:
詳目顯示
第 1 筆 / 共 1 筆
/1
頁
以作者查詢圖書館館藏
、
以作者查詢臺灣博碩士論文系統
、
以作者查詢全國書目
論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文
作者(中文):
甘人方
作者(外文):
Gan, Ren-Fang
論文名稱(中文):
適用於FM-index變體即時比對之DNA序列的壓縮演算法研究
論文名稱(外文):
Compression algorithms of the FM-index variant for just-in-time alignment of DNA sequences
指導教授(中文):
石維寬
指導教授(外文):
Shih, Wei-Kuan
口試委員(中文):
徐讚昇
張原豪
衛信文
口試委員(外文):
Hsu, Tsan-sheng
Chang, Yuan-Hao
Wei, Hsin-Wen
學位類別:
碩士
校院名稱:
國立清華大學
系所名稱:
資訊工程學系
學號:
107062606
出版年(民國):
109
畢業學年度:
108
語文別:
英文
論文頁數:
26
中文關鍵詞:
基因定序
、
FM索引
、
遊程編碼
、
霍夫曼編碼
、
基因壓縮
外文關鍵詞:
DNA sequencing
、
FM-index
、
Run Length Encoding
、
Huffman coding
、
genomic data compression
相關次數:
推薦:0
點閱:231
評分:
下載:0
收藏:0
FM-index被廣泛地應用於基因體的序列比對中,它是基於Burrows-Wheeler轉換的資料結構,當基因體有參考序列時,先將參考序列做Burrows-Wheeler轉換後,它便可以快速地將基因片段比對至參考序列上,但它需要額外的空間來儲存比對用的輔助資訊,所需的空間與參考序列的大小呈正比,且大部分的基因體是由上千萬甚至上億個鹼基所組成,因此它所需要的儲存空間會是一大瓶頸。
Ferragina和 Manzini描述了一種FM-index變體的應用,它藉由只儲存部分的輔助資訊和壓縮的轉換後參考序列來減少所需的儲存空間,但是比對一個片段需要數次的解壓縮以及計算的步驟,會導致比對運行時間大幅地增加。
因此我們提出了兩種針對基因序列的壓縮演算法,並將它們應用在FM-index變體上。它們除了可以達到良好的壓縮率外,還能夠在序列壓縮的情況下進行基因比對,省去解壓縮的步驟,進而減少所需的運行時間。此外,我們也嘗試將演算法應用於書籍上,並對它們做字串的比對,結果顯示,同樣能達到高壓縮率以及能夠快速地比對字串。
The FM-index which is based on Burrows–Wheeler transform is broadly used for sequence alignment against DNA sequences. When the reference sequence of a genome exists, the FM-index can efficiently align reads to the reference sequence. However, it requires extra space to store the auxiliary information for alignment. In addition, the required storage space is related to the size of the reference sequence, and since most of the sequences consist of more than tens of millions of nucleobases, the space required for the FM-index would be an issue.
Ferragina and Manzini have described a variant implementation of the FM-index to solve this problem by only storing part of the auxiliary information and the compressed transformed reference sequence. Nevertheless, it requires numbers of decompression and calculation steps to align one read, which results in a significant increase in the computational cost.
Given the above reason, we propose two compression algorithms for DNA sequences and implement them on the variant implementation of the FM-index. Both of the proposed algorithms could effectively reduce the required space, and furthermore, they allow performing sequence alignment with the compressed sequence, which could eliminate the steps of decompression and thereby reducing the computation time. Apart from DNA sequences, we have done pattern matching against publications, and the results show that our algorithms also have a good effect on them.
Chapter 1. Introduction ........................... 1
Chapter 2. Background and Motivation .............. 3
2.1 Burrows–Wheeler transform ..................... 3
2.2 FM-index ...................................... 4
2.3 Variant implementation of the FM-index ........ 5
2.4 Motivation .................................... 6
Chapter 3. Proposed Compression Algorithms .........7
3.1 Overview ...................................... 7
3.2 Modified Run-Length Encoding .................. 7
3.3 Modified Huffman coding with RLE .............. 10
Chapter 4. Experimental Studies ....................13
4.1 Experimental Setup ............................ 13
4.2 Detail Result ................................. 15
4.3 Other Applications ............................ 19
Chapter 5. Conclusion ............................. 24
References ......................................... 25
[1]. J. Besser and H. A. Carleton. Next-Generation Sequencing Technologies and their Application to the Study and Control of Bacterial Infections. Clin Microbiol Infect, 24(4): 335–341, April 2018.
[2]. M. Burrows and D. J. Wheeler. A Block-sorting Lossless Data Compression Algorithm. DEC SRC Research Report 124, 1994.
[3]. P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Proc. FOCS’00, pp. 390–398, 2000.
[4]. Ferragina and G. Manzini. An experimental study of an opportunistic index. SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms Pages 269-278, January 2001.
[5]. B. Langmead, C. Trapnell, M. Pop and S. L. Salzberg. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, R25, March 2009.
[6]. H. Li and R. Durbin. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, Volume 25, Issue 14, Pages 1754–1760, July 2009.
[7]. H. Li and N. Homer. A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics, Volume 11, Issue 5, Pages 473–483, September 2010.
[8]. National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov.
[9]. J. Shendure, S .Balasubramanian and G. M. Church. DNA sequencing at 40: past, present and future. Nature, volume 550, Pages 345–353, October 2017.
電子全文
中英文摘要
推文
當script無法執行時可按︰
推文
推薦
當script無法執行時可按︰
推薦
評分
當script無法執行時可按︰
評分
引用網址
當script無法執行時可按︰
引用網址
轉寄
當script無法執行時可按︰
轉寄
top
相關論文
1.
Multimedia Programming for Mobile Handhelds
2.
網際網路上支援多媒體串流之封包排程與頻寬保留之研究
3.
NS網路模擬軟體在WLAN之應用研究
4.
在無線網狀網路提供服務品質保證之研究
5.
On-line Scheduling in Real-Time Multiprocessor Systems
6.
遠端身份認證技術及其在行動商務上之應用
7.
植基於雙網無線網路之行動服務平台
8.
應用於無線感測器網路物體追蹤之資料聚集結構與位置預測方法
9.
IXP平台上之安全代理伺服器的設計與實作
10.
即時系統中週期性工作之可排程性分析
11.
在即時嵌入式系統中節省快閃記憶體能源消耗之線上排程演算法
12.
以單一延伸裝置整合多重網路電話服務
13.
多媒體訊息中心之設計與實作
14.
智慧卡密碼驗證機制通訊協定之應用與研究
15.
機器學習方法之複雜處理器編譯器設計
簡易查詢
|
進階查詢
|
論文瀏覽
|
熱門排行
|
管理/審核者登入
前往新版 [國立清華大學博碩士論文庫]
Go [NTHU Theses & Dissertations Repository]
關閉