patternMinor
Infix search in millions of strings
Viewed 0 times
infixstringssearchmillions
Problem
Let's say we have millions of strings (each of them
-
I'm looking for something working on strings even without meaning (the pattern
-
I'm looking for something working on strings even without meaning (the pattern
qys should allow to find the string uyiuqysidi among millions of other strings), on DNA (pattern ATTG found in GGATCATTGAAGG), on sequences (subsequence 1, 4, 8 found in sequence 7, 2, 1, 4, 8, 19, 32), etc.Solution
Let $\$$ be a symbol not in the alphabet, and let $\{ w_i \}_{i \in \mathbb{N}}$ be the strings you are searching from. Construct the string $S = w_1 \circ \$ \circ w_2 \circ \$ \circ \dots \circ w_n$ and use Ukkonen's algorithm to construct a suffix tree for $S$. You are now able to retrieve all $m$ occurrences of a pattern $P$ in time $\Theta(m + |P|)$, with $\sum_i|w_i|$ preprocessing time.
Context
StackExchange Computer Science Q#90928, answer score: 3
Revisions (0)
No revisions yet.