patternMinor
Finding the smallest string that contains a given set of substrings
Viewed 0 times
smallestsubstringsthegiventhatcontainsfindingstringset
Problem
The algorithm I am looking for has the following requirements: Input is a set of strings. You are looking for a string containing all input strings. The resulting string should be as short as possible. At least shorter than just concatenating all input strings if possible.
Examples:
Is there a name for this problem with which it is better to search for solutions?
Examples:
Input: a ab bc
Output: abc
Input: abcd bcde ef
Output: abcdefIs there a name for this problem with which it is better to search for solutions?
Solution
This problem is called shortest superstring problem. John Gallant, David Maier and James Astorer proved it is NP-hard in 19791.
Given two strings $A$ and $B$, let $|A|$ denote the length of $A$, and let $S(A,B)$ denote the shortest superstring of $A$ and $B$ where $A$ occurs before $B$. It is easy to reduce this problem to travelling salesman problem, where nodes represent strings, and the distance from string $A$ to $B$ is $|S(A,B)|-|A|-|B|$. So you can use known algorithms for travelling salesman problem to solve this problem.
There are several polynomial approximation algorithms. The best known approximation factor is $2\frac{11}{23}$, and the corresponding algorithm is proposed by Marcin Mucha in 20132.
1Gallant, J., Maier, D., & Astorer, J. (1980). On finding minimal length superstrings. Journal of Computer and System Sciences, 20(1), 50-58.
2Mucha, M. (2013, January). Lyndon words and short superstrings. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms (pp. 958-972). Society for Industrial and Applied Mathematics.
Given two strings $A$ and $B$, let $|A|$ denote the length of $A$, and let $S(A,B)$ denote the shortest superstring of $A$ and $B$ where $A$ occurs before $B$. It is easy to reduce this problem to travelling salesman problem, where nodes represent strings, and the distance from string $A$ to $B$ is $|S(A,B)|-|A|-|B|$. So you can use known algorithms for travelling salesman problem to solve this problem.
There are several polynomial approximation algorithms. The best known approximation factor is $2\frac{11}{23}$, and the corresponding algorithm is proposed by Marcin Mucha in 20132.
1Gallant, J., Maier, D., & Astorer, J. (1980). On finding minimal length superstrings. Journal of Computer and System Sciences, 20(1), 50-58.
2Mucha, M. (2013, January). Lyndon words and short superstrings. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms (pp. 958-972). Society for Industrial and Applied Mathematics.
Context
StackExchange Computer Science Q#88387, answer score: 7
Revisions (0)
No revisions yet.