帳號:guest(3.15.193.249)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):楊易霖
作者(外文):Yang, Yilin
論文名稱(中文):字典匹配問題的變形
論文名稱(外文):Variants of Dictionary Matching
指導教授(中文):韓永楷
指導教授(外文):Wing-Kai Hon
口試委員(中文):盧錦隆
廖崇碩
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:103062574
出版年(民國):105
畢業學年度:104
語文別:英文
論文頁數:50
中文關鍵詞:字典匹配問題
外文關鍵詞:Dictionary matching
相關次數:
  • 推薦推薦:0
  • 點閱點閱:82
  • 評分評分:*****
  • 下載下載:12
  • 收藏收藏:0
這篇碩論主要研究的方向為字典匹配問題,我總共研究了三個不同的變形。
第一個變形是所有字典裡的字串都可以有一個gap,這個問題Amir他們在2014年提出了一個解法,而我換個方向改變了他們儲存資料的方式,改進了他們的做法,使得在同樣限制下我們的空間與時間複雜度都更小。而且我們的做法還可以將這個問題擴展到更加一般性的問題。
第二個變形跟第一個問題很像,這問題字典裡的所有字串都可以消失其中一段,但我們不知道是哪一段消失了,然後希望可以讀取一個文本text,找到所有可能的字串匹配的位置。這個問題是一個全新的問題,目前還沒有人討論過這個問題。我們透過在failure tree上做樹鏈剖分,提出了一個完整可行的解法。
第三個問題跟前兩個問題都不一樣,字典裡的字串跟一般的字串是一樣的,沒有gap也不能消失其中一段,但我們把字母做一對一的映射後,也可以視作是同樣的字串來做匹配,比如說aab跟xxy可以視作是一樣的兩個字串。我們提出了一個新的方式,修改了前人的做法,降低了以前算法的空間複雜度。
Dictionary matching is a well-studied problem in computer sci- ence, which have found numerous applications, such as computer virus detection and bioinformatics analysis. In this thesis, we study three variants of the dictionary matching problem. The first variant, called dictionary matching with gapped patterns, is recently proposed by Amir et al. in which a pattern may be matched with a substring of a query text T with a gap of bounded length present in the pattern. We first give an alter- native linear-space solution to Amir et al.’s problem, where gap lengths of all patterns have the same lower bound α and upper bound β. The query time on any query text T is bounded by O((β − α + 1)|T | log d + occ), where d denotes the number of patterns, and occ denotes the size of the output. After that, we show that the framework can be generalized to handle the case where gaps may have different bounds, thereby answering one of the open problems raised by Amir et al. The second variant,
called dictionary matching with one missing substring, is a new problem in which a gap of bounded length may be present in the text substring when it is being matched. We show that this prob- lem can be solved by using a similar framework. Furthermore, by applying a novel indexing technique on the failure tree, we obtain a space-time tradeoff result, which will be suitable when the dictionary contains only short patterns, or when index space is a critical concern. The third variant is called parameterized dictionary matching. Idury et al. proposed a linear-space so- lution for this problem using Baker’s encoding method. Here, we come up with a new encoding method, which allows us to achieve a better space complexity for the index, with only slight slowdown in the query time.
1 Introduction 1

1.1 Introduction and Related Work . . . . . . . . . . 1

1.2 Organization .................... 7

2 Preliminaries 8

2.1 Strings, Suffix Tree, and Tree Labelling . . . . . . 8

2.2 AC Automaton ................... 10

3 Dictionary Matching with Gapped Pattern 13

3.1 Problem Definition ................. 14

3.2 Orthogonal Range Searching and Point Enclosure
Problem....................... 15

3.3 When Gaps Have the Same Bounds . . . . . . . . 16

3.4 When Gaps Have Different Bounds . . . . . . . . 20

4 Dictionary Matching with One Missing Substring
in Text 23

4.1 Problem Definition ................. 24

4.2 Centroid Path Decomposition . . . . . . . . . . . 25

4.3 The Naive Method ................. 25

4.4 Compressing the Index............... 28

5 Parameterized Dictionary Matching 33

5.1 Problem Definition ................. 34

5.2 Encoding Method.................. 35

5.3 Compact Space Index................ 37

5.4 How to Query.................... 40

6 Conclusion 45
[1] A. Aho and M. Corasick: Efficient String Matching: An Aid to Bibliographic Search. Communications of the ACM (CACM), 18(6):333–340, 1975.

[2] A. Amir, D. Keselman, G. M. Landau, M. Lewenstein, N. Lewenstein, and M. Rodeh: Text Indexing and Dic- tionary Matching With One Error. Journal of Algorithms, 37(2):309–325, 2000.

[3] A. Amir, A. Levy, E. Porat, and B. R. Shalom: Dictionary Matching with One Gap. In Proc. of Symposium on Com- binatorial Pattern Matching (CPM), pages 11–20, 2014.

[4] B. S. Baker: Parameterized Pattern Matching: Algorithms and Applications. Journal of Computer and System Sci-ences, 52(1):28–42, 1996.

[5] D. Belazzougui: Succinct Dictionary Matching with No Slowdown. In Proc. of Symposium on Combinatorial Pat- tern Matching (CPM), pages 88–100, 2010.

[6] T. M. Chan, K. G. Larsen, and M. Pa ̆tra ̧scu: Orthogonal Range Searching on the Ram, Revisited. In Proc. of Sym- posium on Computational Geometry (SoCG), pages 1–10, 2011.

[7] B. Chazelle: Filtering Search: A New Approach to Query Answering. SIAM Journal on Computing (SICOMP), 15(3):703–724, 1986.

[8] R. Cole, L.-A. Gottlieb, and M. Lewenstein: Dictionary Matching and Indexing with Errors and Don’t Cares. In Proc. of Symposium on Theory of Computing (STOC), pages 91–100, 2004.

[9] T. Haapasalo, P. Silvasti, S. Sippu, and E. Soisalon- Soininen: Online Dictionary Matching with Variable-Length Gaps. In Proc. of Symposium on Experimental Al- gorithms (SEA), pages 76–87, 2011.

[10] W. K. Hon, T. H. Ku, T. W. Lam, R. Shah, S. L. Tam, S. V. Thankachan, and J. S. Vitter: Compressing Dictio- nary Matching Index via Sparsification Technique. Algorith- mica, 2014.

[11] W. K. Hon, T. H. Ku, R. Shah, S. V. Thankachan, and J. S. Vitter: Faster Compressed Dictionary Matching. The- oretical Computer Science (TCS), 475:113–119, 2013.

[12] R. M. Idury and A. A. Schaffer: Multiple Matching of Parameterized Patterns. Theoretical Computer Science (TCS), 154(2):203–224, 1996

[13] G. Kucherov and M. Rusinowitch: Matching a Set of Strings with Variable Length Don’t Cares. Theoretical Computer Science (TCS), 178:129–154, 1997.

[14] E. M. McCreight: A Space-Economical Suffix Tree Construction Algorithm. Journal of the ACM (JACM),23(2):262–272, 1976.

[15] S. Rahul: Improved Bounds for Orthogonal Point Enclo- sure Query and Point Location in Orthogonal Subdivi- sions in IR3. In Proc. of Symposium on Discrete Algorithms (SODA), pages 200–211, 2015.

[16] P. Weiner: Linear Pattern Matching Algorithms. In Proc. of Symposium on Switching and Automata Theory, pages 1–11, 1973.

[17] M. Zhang, Y. Zhang, and L. Hu: A Faster Algorithm for Matching a Set of Patterns with Variable Length Don’t Cares. Information Processing Letters (IPL), 110(6):216– 220, 2010.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *