帳號:guest(18.119.135.0)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳庭偉
作者(外文):Chen, Ting Wei
論文名稱(中文):以BWT建立方式解決最大重複子字串問題
論文名稱(外文):On the Construction of the Burrows-Wheeler Transform and the Maximal Repeating Group Finding
指導教授(中文):盧錦隆
指導教授(外文):Lu, Chin Lung
口試委員(中文):李家同
唐傳義
口試委員(外文):Lee, Chia Tung
Tang, Chuan Yi
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:103062539
出版年(民國):105
畢業學年度:104
語文別:英文中文
論文頁數:98
中文關鍵詞:最大重複子字串字串比對
外文關鍵詞:BWTMaximal Repeating GroupsExact String Matching
相關次數:
  • 推薦推薦:0
  • 點閱點閱:64
  • 評分評分:*****
  • 下載下載:5
  • 收藏收藏:0
在這篇論文中,我們對於以Burrows-Wheeler Transform (簡稱BWT) 來解決字串比對問題有相當大的興趣,使用BWT的問題在於為某個字串產生出BWT非常耗費時間,我們的方法是以KSS的方法為基礎來修改並產生出BWT。我們的方法較KSS更簡單理解並實作,而且我們的實驗結果也顯示出我們產生出BWT的方法相當有效率。同時,我們也對最大重複子字串的問題感到相當大的興趣,也依照我們產生出BWT的方法稍做修改之後並利用到解決此問題上,舉例來說,我們的實驗裡,有一串長度為155606181個字的DNA序列,在這麼長的序列中找到長度大於2000的重複子字串只花了我們226秒,我們也成功找出了55對最大重複子字串。
In this thesis, we are interested in the Burrows-Wheeler Transform (BWT for short) for exact string matching. The problem of BWT is that it is very time-consuming to construct BWT. We have developed a method which is based upon the KSS Method to construct BWT. Our method is quite easy to comprehend and implement. Experimental results show that our method is efficient. We are also interested in the maximal repeating group problem. We have developed an efficient method to find maximal repeating groups. For example, for a DNA sequence with length 155606181, it took only 226 seconds to find 55 maximal repeating groups with lengths longer than 2000.
Chapter 1 Introduction 9
1.1 Motivations 9
1.2 Background 9
1.3 Thesis Organization 13
Chapter 2 The Suffix Tree Approach 14
2.1 Prefix and Suffix 14
2.2 The Introduction of Suffix Tree 16
2.3 The Searching of Suffix Tree 20
Chapter 3 The Suffix Array Approach 24
3.1 An Introduction of the Suffix Array Approach 24
3.2 The Searching of the Suffix Array Approach 26
Chapter 4 The Burrows Wheeler Transform 32
4.1 An Introduction of the Burrows Wheeler Transform 32
4.2 The BWT Search 34
4.3 Correctness 38
4.4 The Implementation of the BWT Method 41
Chapter 5 Our Method to Obtain the BWT Efficiently 45
5.1 The Introduction of Our Method 45
5.2 The Experiment of Our Method 46
Chapter 6 Our Method to Solve Repeating Group Finding Problem 54
6.1 Solving the Repeating Group Finding Problem with Dynamic Programming 54
6.2 Our Method to Find the Repeating Groups 57
6.3 Experiment Results 61
References 76
Appendix A 79
Appendix B 87
[ARM16] The 3D Folding of Metazoan Genomes Correlates with the Association of Similar Repetitive Elements, Cournac, A., Koszul, R., Mozziconacci, J., Nucleic Acids Research, 2016.

[BM77] Boyer, R. S. and Moore, J. S., A Fast String Searching Algorithm, Communications of ACM, Vol. 20, No. 10, 1977, p.p. 762–772.

[BW94] Burrows, M. and Wheeler, D. J., A block sorting lossless data compression algorithm. Technical Report124, Digital Equipment Corporation, 1994.

[FM2000] Ferragina P. and Manzini G., Opportunistic data structures with applications. Proceedings of the 41st Symposium on Foundations of Computer Science, 2000.

[CCGJLPR94] Crochemore, M., Czumaj, A., Gasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W. and Rytter, W., Speeding Up Two String-matching Algorithms, Algorithmica, Vol. 12, 1994, pp. 247-267.

[FP74] Fischer, M. M. and Paterson, M. S., String-Matching and Other Products, SIAM-AMS Proceedings, Vol. 7., 1974, pp. 113-125 (In "Complexity of Computation", R.M. Karp.)

[H2012] Hou, K. W., The Discrete Convolution Method on Solving the Exact String Matching Problem, MS Thesis, 2012, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan.

[H75] A Linear Space Algorithm for Computing Maximal Common Subsequences, Hirschberg, D. S., Communications of the ACM, Vol. 18, No. 6, 1975, pp. 341-343.

[H80] Horspool, R. N., Practical Fast Searching in Strings, Software Practice and Experience, Vol. 10, 1980, pp. 501-506.

[KMP77] Knuth, D. E., Morris, J. H. and Pratt, V. R., Fast Pattern Matching in Strings, SIAM Journal on Computing, Vol. 6, No.2, 1977, pp. 323-350.

[K13] Kung, B. R. On the Repeating Group Finding Problem, M. S. Thesis, Department of Computer Science, Takming University of Science and Technology, Taipei, Taiwan, 2013.

[KSS06] Kärkkäinen, J., Sanders, P. and Burkhardt, S. Linear Work Suffix Array Construction, Journal of the ACM (JACM), Volume 53 Issue 6, November 2006. pp. 918-936.

[KGAP01] de Koning, A. P. J., Gu, W., Castoe, T. A., Batzer, M. A., Pollock, D. D., Repetitive Elements May Comprise Over Two-Thirds of the Human Genome, PLoS Genet 7. 12 (2011): e1002384.

[M56] McClintock, B., Controlling Element and the Gene, Cold Spring Harb. Symp. Quant. Biol. 1956; 21: 197-216.

[M76] McCreight, E. M., A Space-Economical Suffix Tree Construction Algorithm, Journal of the ACM, Vol. 23, 1976, pp. 262-272.

[MM93] Manber, U. and Myers, G., Suffix Arrays: A New Method for On-line String Searches, SIAM Journal on Computing, Vol. 22, 1993, pp. 935-948.

[MP70] Morris, J. H. and Pratt, V. R., A Linear Pattern-matching Algorithm, Technical Report 40, University of California, Berkeley, 1970.

[O72] Ohno, S., So much “junk” DNA in our genome, Brookhaven Symp Biol, Vol. 23, 1972.

[U95] Ukkonen, E., On-line Construction of Suffix Trees, Algorithmica, Vol. 14, 1995, pp. 249-260.

[UWD08] Ussery, D. W., Wassenaar, T., Borini, S., Word Frequencies, Repeats, and Repeat-related Structures in Bacterial Genomes, Computing for Comparative Microbial Genomics: Bioinformatics for Microbiologists. Computational Biology 8 (1 ed.). Springer, 2008, Pp. 133-144.

[R97] Raffinot, M., On the Multi Backward Dawg Matching Algorithm, Proc. The 4th South American Workshop on String Processing, 1997, pp. 149-165.

[S05] Shapiro, J.A., von Sternberg R. Why Repetitive DNA is Essential to Genome Function, Biol. Rev. 2005; 80: 227-250.

[W2004] Wu, B. H., Convolution and Its Application to Sequence Analysis, MS Thesis, 2004, National Chi Nan University, Puli, Nantou, Taiwan.

[W73] Weiner, P., Linear Pattern Matching Algorithms, 14th Annual IEEE Symposium on Switching and Automata Theory, 1973, pp. 1-11.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *