patternMinor
Find the smallest set of strings which "covers" a given set of strings (coverage = containing as substring)
Viewed 0 times
smallestcontainingthecoverssubstringfindwhichstringsgivenset
Problem
Let $S$ be a finite set of strings and $0
I would appreciate any help regarding this problem. I see that I could generate the set of all strings with length between $k$ and $l$ then consider the power set but I think there must be a better way to solve this.
I'm not even sure how hard this problem is (I think there may be a reduction for the set cover problem which means it is $NP$-complete but I really don't know) or if there are any good approximation (or maybe exact) algorithms available.
I would like to solve this problem primarily in practice. Typically, $3 \leq k \leq l \leq 10$ holds, $|S| \approx 2000$ and the lengths of the elements in $S$ can vary from 10 to 50 mainly.
- $\forall s \in S \ \exists t \in T(k,l): t \subset s$ (meaning: $s$ is containing $t$ as a contiguous substring).
I would appreciate any help regarding this problem. I see that I could generate the set of all strings with length between $k$ and $l$ then consider the power set but I think there must be a better way to solve this.
I'm not even sure how hard this problem is (I think there may be a reduction for the set cover problem which means it is $NP$-complete but I really don't know) or if there are any good approximation (or maybe exact) algorithms available.
I would like to solve this problem primarily in practice. Typically, $3 \leq k \leq l \leq 10$ holds, $|S| \approx 2000$ and the lengths of the elements in $S$ can vary from 10 to 50 mainly.
Solution
Without loss of generality, we can take $l=k$. In particular, any solution to $T(k,l)$ can be converted into a solution for $T(k,k)$ by shortening each string that's longer to $k$ to a string of length $k$ (by deleting as many characters from the beginning or end as needed to get down to length $k$, chosen arbitrarily). So, it will simplify your life to focus on solving $T(k,k)$, i.e., finding a set of strings of length $k$.
Here is one possible approach. You don't need to exhaustively generate all strings of length $k$. Instead, you can generate all length-$k$ substrings of words of $S$. If each word of $S$ has length $\le n$, then there will be $\le (n-k+1) \cdot |S|$ of them. Then, you could apply some standard set-cover algorithm to that. So, you don't need to generate exponentially many strings; just polynomially many.
It's possible there may be better algorithms that take into account the combinatorial structure of the strings; I don't know.
Here is one possible approach. You don't need to exhaustively generate all strings of length $k$. Instead, you can generate all length-$k$ substrings of words of $S$. If each word of $S$ has length $\le n$, then there will be $\le (n-k+1) \cdot |S|$ of them. Then, you could apply some standard set-cover algorithm to that. So, you don't need to generate exponentially many strings; just polynomially many.
It's possible there may be better algorithms that take into account the combinatorial structure of the strings; I don't know.
Context
StackExchange Computer Science Q#105093, answer score: 2
Revisions (0)
No revisions yet.