HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Is there a data structure for efficiently searching a string that contains a given substring?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
efficientlysearchingsubstringgiventhatforstructurecontainstheredata

Problem

This question arose from a practical problem: given a set of texts, find one, which contains a given string (not word).

Let $S$ be a set of $n$ strings, and $l$ the length of the longest string in $S$. What will be the best data structure to efficiently search a string that contains a substring of length $k$ in $S$?

Sequential scan with Knuth–Morris–Pratt algorithm yields $O(n(l+k))$ searching complexity.
I thought maybe something similar to tries exists.

Solution

You could append all of the $n$ strings together, and add an arbitrary character '\$' not in the pattern to separate them. Then you could apply the Z algorithm on your original pattern and this new string to yield time complexity $O(nl+k)$ which is more efficient than your suggested $O(nl+nk)$.

Actually, you could still iterate with KMP for $O(nl+k)$ time, beceause the $k$ component is preprocessing the pattern you are trying to find. This only has to be done once, regardless of the number of strings you are checking.

Context

StackExchange Computer Science Q#69666, answer score: 6

Revisions (0)

No revisions yet.