patternMinor
Is there a data structure for efficiently searching a string that contains a given substring?
Viewed 0 times
efficientlysearchingsubstringgiventhatforstructurecontainstheredata
Problem
This question arose from a practical problem: given a set of texts, find one, which contains a given string (not word).
Let $S$ be a set of $n$ strings, and $l$ the length of the longest string in $S$. What will be the best data structure to efficiently search a string that contains a substring of length $k$ in $S$?
Sequential scan with Knuth–Morris–Pratt algorithm yields $O(n(l+k))$ searching complexity.
I thought maybe something similar to tries exists.
Let $S$ be a set of $n$ strings, and $l$ the length of the longest string in $S$. What will be the best data structure to efficiently search a string that contains a substring of length $k$ in $S$?
Sequential scan with Knuth–Morris–Pratt algorithm yields $O(n(l+k))$ searching complexity.
I thought maybe something similar to tries exists.
Solution
You could append all of the $n$ strings together, and add an arbitrary character '\$' not in the pattern to separate them. Then you could apply the Z algorithm on your original pattern and this new string to yield time complexity $O(nl+k)$ which is more efficient than your suggested $O(nl+nk)$.
Actually, you could still iterate with KMP for $O(nl+k)$ time, beceause the $k$ component is preprocessing the pattern you are trying to find. This only has to be done once, regardless of the number of strings you are checking.
Actually, you could still iterate with KMP for $O(nl+k)$ time, beceause the $k$ component is preprocessing the pattern you are trying to find. This only has to be done once, regardless of the number of strings you are checking.
Context
StackExchange Computer Science Q#69666, answer score: 6
Revisions (0)
No revisions yet.