patternMinor
Longest Repeated (Scattered) Subsequence in a String
Viewed 0 times
scatteredrepeatedsubsequencelongeststring
Problem
Informal Problem Statement:
Given a string, e.g. $ACCABBAB$, we want to colour some letters red and some letters blue (and some not at all), such that reading only the red letters from left to right yields the same result as reading only the blue letters.
In the example we could colour them like this: $A\color{blue}{C}\color{red}{CAB}B\color{blue}{AB}$
Therefore, we say $CAB$ is a repeated subsequence of $ACCABBAB$. It is also a longest repeated subsequence (which is easy to check).
Can we compute longest repeated subsequences efficiently?
Formal Question:
Is it NP-hard to decide for a string and some $k$, whether a repeated subsequence of length $k$ exists in the string?
Bonus Question:
Will their always be a repeated subsequence of length $n/2 - o(n)$ if the size of the alphabet is bounded by a constant?
(This is known to be true for binary alphabets.)
Edit 2: The negative answer to the Bonus Question is already known for alphabets of size at least $5$. In fact for alphabets of size $Σ$, there are strings with longest repeated subsequences of a length of merely $O(n · Σ^{-1/2})$. Random strings suffice to show this. The result already existed, but I overlooked it.
Edit:
Note:
Some people mean "substring" when they say "subsequence". I don't. This is not the problem of finding a substrings twice.
Given a string, e.g. $ACCABBAB$, we want to colour some letters red and some letters blue (and some not at all), such that reading only the red letters from left to right yields the same result as reading only the blue letters.
In the example we could colour them like this: $A\color{blue}{C}\color{red}{CAB}B\color{blue}{AB}$
Therefore, we say $CAB$ is a repeated subsequence of $ACCABBAB$. It is also a longest repeated subsequence (which is easy to check).
Can we compute longest repeated subsequences efficiently?
Formal Question:
Is it NP-hard to decide for a string and some $k$, whether a repeated subsequence of length $k$ exists in the string?
- If so: Which problem can be reduced to this problem?
- If not: What is an efficient algorithm? (obviously, this algorithm can then be used to compute a longest repeated subsequence)
Bonus Question:
Will their always be a repeated subsequence of length $n/2 - o(n)$ if the size of the alphabet is bounded by a constant?
(This is known to be true for binary alphabets.)
Edit 2: The negative answer to the Bonus Question is already known for alphabets of size at least $5$. In fact for alphabets of size $Σ$, there are strings with longest repeated subsequences of a length of merely $O(n · Σ^{-1/2})$. Random strings suffice to show this. The result already existed, but I overlooked it.
Edit:
Note:
Some people mean "substring" when they say "subsequence". I don't. This is not the problem of finding a substrings twice.
Solution
The special case of $k = n/2$ is the same problem as this CST.SE question How hard is unshuffling a string? asks.
Buss and Soltys proved NP-completeness of this problem [1] by reducing 3-Partition problem to this problem.
Buss and Soltys proved NP-completeness of this problem [1] by reducing 3-Partition problem to this problem.
- [1]: Buss, Sam, and Michael Soltys. "Unshuffling a square is NP-hard." Journal of Computer and System Sciences 80.4 (2014): 766-776.
Context
StackExchange Computer Science Q#27776, answer score: 5
Revisions (0)
No revisions yet.