作者(外文):Yang, Yilin
論文名稱(外文):Variants of Dictionary Matching
指導教授(外文):Wing-Kai Hon
外文關鍵詞:Dictionary matching
第二個變形跟第一個問題很像,這問題字典裡的所有字串都可以消失其中一段,但我們不知道是哪一段消失了,然後希望可以讀取一個文本text,找到所有可能的字串匹配的位置。這個問題是一個全新的問題,目前還沒有人討論過這個問題。我們透過在failure tree上做樹鏈剖分,提出了一個完整可行的解法。
Dictionary matching is a well-studied problem in computer sci- ence, which have found numerous applications, such as computer virus detection and bioinformatics analysis. In this thesis, we study three variants of the dictionary matching problem. The first variant, called dictionary matching with gapped patterns, is recently proposed by Amir et al. in which a pattern may be matched with a substring of a query text T with a gap of bounded length present in the pattern. We first give an alter- native linear-space solution to Amir et al.’s problem, where gap lengths of all patterns have the same lower bound α and upper bound β. The query time on any query text T is bounded by O((β − α + 1)|T | log d + occ), where d denotes the number of patterns, and occ denotes the size of the output. After that, we show that the framework can be generalized to handle the case where gaps may have different bounds, thereby answering one of the open problems raised by Amir et al. The second variant,
called dictionary matching with one missing substring, is a new problem in which a gap of bounded length may be present in the text substring when it is being matched. We show that this prob- lem can be solved by using a similar framework. Furthermore, by applying a novel indexing technique on the failure tree, we obtain a space-time tradeoff result, which will be suitable when the dictionary contains only short patterns, or when index space is a critical concern. The third variant is called parameterized dictionary matching. Idury et al. proposed a linear-space so- lution for this problem using Baker’s encoding method. Here, we come up with a new encoding method, which allows us to achieve a better space complexity for the index, with only slight slowdown in the query time.
1 Introduction 1

1.1 Introduction and Related Work . . . . . . . . . . 1

1.2 Organization .................... 7

2 Preliminaries 8

2.1 Strings, Suffix Tree, and Tree Labelling . . . . . . 8

2.2 AC Automaton ................... 10

3 Dictionary Matching with Gapped Pattern 13

3.1 Problem Definition ................. 14

3.2 Orthogonal Range Searching and Point Enclosure
Problem....................... 15

3.3 When Gaps Have the Same Bounds . . . . . . . . 16

3.4 When Gaps Have Different Bounds . . . . . . . . 20

4 Dictionary Matching with One Missing Substring
in Text 23

4.1 Problem Definition ................. 24

4.2 Centroid Path Decomposition . . . . . . . . . . . 25

4.3 The Naive Method ................. 25

4.4 Compressing the Index............... 28

5 Parameterized Dictionary Matching 33

5.1 Problem Definition ................. 34

5.2 Encoding Method.................. 35

5.3 Compact Space Index................ 37

5.4 How to Query.................... 40

6 Conclusion 45
