patternMinor
Permutation of words that have matched parentheses
Viewed 0 times
parenthesesmatchedwordsthatpermutationhave
Problem
Let $L$ denote the (context-free) language of matched parentheses over the alphabet $\Sigma$. Consider the following problem:
Input: words $x_1,\dots,x_n \in \Sigma^*$
Question: does there exist a permutation $\sigma$ of $\{1,\dots,n\}$ such that $x_{\sigma(1)} x_{\sigma(2)} \cdots x_{\sigma(n)} \in L$?
What is the complexity of this problem? Can it be solved in polynomial time? Is it NP-hard?
It "smells" like something that might be NP-hard. But perhaps there is a way to reduce this to the route inspection problem (which has a polynomial-time algorithm), perhaps by treating each $x_i$ as an edge in some graph, or something?
Motivation: This is inspired by a question about permutations of lines of code, but my main interest is just curiousity.
Related:
Input: words $x_1,\dots,x_n \in \Sigma^*$
Question: does there exist a permutation $\sigma$ of $\{1,\dots,n\}$ such that $x_{\sigma(1)} x_{\sigma(2)} \cdots x_{\sigma(n)} \in L$?
What is the complexity of this problem? Can it be solved in polynomial time? Is it NP-hard?
It "smells" like something that might be NP-hard. But perhaps there is a way to reduce this to the route inspection problem (which has a polynomial-time algorithm), perhaps by treating each $x_i$ as an edge in some graph, or something?
Motivation: This is inspired by a question about permutations of lines of code, but my main interest is just curiousity.
Related:
- Time complexity of a problem inspired by palindromes claims to show that the problem is NP-hard when $L$ is the language of palindromes, a different context-free language (though I don't understand the proof).
- Word tiling, where you must use each tile exactly once considers the problem where $L=\{y\}$ where $y \in \Sigma^*$ is a single word specified as part of the input, and the answers show this is NP-hard too.
Solution
When there are more than one type of parentheses (say
When there are only one type of parentheses (say
For a character $c\in\{$
$$
t(c)=\begin{cases}
1 & \text{if }c\text{ is '('},\\
-1 & \text{if }c\text{ is ')'}.
\end{cases}
$$
For a string $s=s_1s_2\ldots$ where $s_i\in\{$
$$
\begin{align}
f(s_1s_2)&=f(s_1)+f(s_2),\\
g(s_1s_2)&=\max\{g(s_1)+f(s_2), g(s_2)\}.
\end{align}
$$
We say a string $s$ is potentially accepted if for any prefix $p$ of $s$, $f(p)\ge 0$. Note $s$ is potentially accepted if and only if $f(s)\ge g(s)$. Also note $s\in L$ if and only if $f(s)=0$ and $s$ is potentially accepted.
Note if $\sigma$ is a valid permutation, and $f(x_{\sigma(i)})<0$ and $f(x_{\sigma(i+1)})\ge0$ for some $i$, we can swap the two elements to get another valid permutation. This suggests that we can arrange all $x_i$'s satisfying $f(x_i)\ge 0$ (say $x_{r_1},\ldots,x_{r_k}$) on the first $k$ positions, then arrange other $x_i$'s on the rest positions.
For the first $k$ positions, we process greedily. That is, say $\sigma(1),\ldots,\sigma(j)$ is determined, if there is some $i$ such that $x_{\sigma(1)}x_{\sigma(2)}\ldots x_{\sigma(j)}x_{r_i}$ is potentially accepted, we arrange $x_{r_i}$ on position $j+1$, i.e. $\sigma(j+1)=r_i$. Otherwise, we cannot arrange these $x_{r_i}$'s on positions $1,\ldots,j$ too, so we assert there is no valid permutation.
For the rest positions, we arrange them according to $g$ from big to small. That is, $g(x_{\sigma(k+1)})\ge g(x_{\sigma(k+2)})\ge\cdots\ge g(x_{\sigma(n)})$. If this arrangement is invalid, i.e. the final string does not belong to $L$, we assert there is no valid permutation. The correctness of this process comes from the following lemma:
If for some valid $\sigma$ such that $f(x_{\sigma(k+1)}),\ldots,f(x_{\sigma(n)})<0$ and for some $i$, $g(x_{\sigma(k+i)})<g(x_{\sigma(k+i+1)})$, we can swap $x_{\sigma(k+i)}$ and $x_{\sigma(k+i+1)}$ to get another valid permutation.
Proof of the lemma. For convenience, denote $s_0=x_{\sigma(1)}\ldots x_{\sigma(k+i-1)}$ and $s=x_{\sigma(1)}\ldots x_{\sigma(n)}$. Note after swapping, $f(p)$ is not changed for prefix $p$ of $s$ such that $|p|\le |s_0|$ or $|p|\ge |s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)}|$, we need only to show $s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)}$ is potentially accepted. Note
$s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)}$ is potentially accepted, we have
$$
\begin{align}
g(s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)})\le f(s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)})
\end{align}
$$
or
$$
\begin{align}
g(s_0)+f(x_{\sigma(k+i)})+f(x_{\sigma(k+i+1)})&\le f(s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)}),\\
g(x_{\sigma(k+i)})+f(x_{\sigma(k+i+1)})&\le f(s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)}),\\
g(x_{\sigma(k+i+1)})&\le f(s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)}),
\end{align}
$$
or
$$
\begin{align}
g(s_0)+f(x_{\sigma(k+i+1)})+f(x_{\sigma(k+i)})&\le f(s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)}),\\
g(x_{\sigma(k+i+1)})+f(x_{\sigma(k+i)})<g(x_{\sigma(k+i+1)}) &\le f(s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)}),\\
g(x_{\sigma(k+i)})\le g(x_{\sigma(k+i+1)})&\le f(s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)}),
\end{align}
$$
which means
$$
\begin{align}
g(s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)})\le f(s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)}).
\end{align}
$$
Hence $s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)}$ is potentially accepted.
[] and ()), the problem is NP-complete. The proof is the same as the proof for the palindrome problem except that in the reduction, 0s and 1s in $w$ are changed to [s and (s respectively, 0s in $v_i$ are changed to ]s and 1s in $r_k$ are changed to )s.When there are only one type of parentheses (say
()), the problem can be solved in polynomial time. For a character $c\in\{$
($,$)$\}$, denote $$
t(c)=\begin{cases}
1 & \text{if }c\text{ is '('},\\
-1 & \text{if }c\text{ is ')'}.
\end{cases}
$$
For a string $s=s_1s_2\ldots$ where $s_i\in\{$
($,$)$\}$, denote $f(s)=\sum_{i=1}^{|s|}t(s_i)$ and $g(s)=\max_j\sum_{i=j}^{|s|}t(s_i)$. We have$$
\begin{align}
f(s_1s_2)&=f(s_1)+f(s_2),\\
g(s_1s_2)&=\max\{g(s_1)+f(s_2), g(s_2)\}.
\end{align}
$$
We say a string $s$ is potentially accepted if for any prefix $p$ of $s$, $f(p)\ge 0$. Note $s$ is potentially accepted if and only if $f(s)\ge g(s)$. Also note $s\in L$ if and only if $f(s)=0$ and $s$ is potentially accepted.
Note if $\sigma$ is a valid permutation, and $f(x_{\sigma(i)})<0$ and $f(x_{\sigma(i+1)})\ge0$ for some $i$, we can swap the two elements to get another valid permutation. This suggests that we can arrange all $x_i$'s satisfying $f(x_i)\ge 0$ (say $x_{r_1},\ldots,x_{r_k}$) on the first $k$ positions, then arrange other $x_i$'s on the rest positions.
For the first $k$ positions, we process greedily. That is, say $\sigma(1),\ldots,\sigma(j)$ is determined, if there is some $i$ such that $x_{\sigma(1)}x_{\sigma(2)}\ldots x_{\sigma(j)}x_{r_i}$ is potentially accepted, we arrange $x_{r_i}$ on position $j+1$, i.e. $\sigma(j+1)=r_i$. Otherwise, we cannot arrange these $x_{r_i}$'s on positions $1,\ldots,j$ too, so we assert there is no valid permutation.
For the rest positions, we arrange them according to $g$ from big to small. That is, $g(x_{\sigma(k+1)})\ge g(x_{\sigma(k+2)})\ge\cdots\ge g(x_{\sigma(n)})$. If this arrangement is invalid, i.e. the final string does not belong to $L$, we assert there is no valid permutation. The correctness of this process comes from the following lemma:
If for some valid $\sigma$ such that $f(x_{\sigma(k+1)}),\ldots,f(x_{\sigma(n)})<0$ and for some $i$, $g(x_{\sigma(k+i)})<g(x_{\sigma(k+i+1)})$, we can swap $x_{\sigma(k+i)}$ and $x_{\sigma(k+i+1)}$ to get another valid permutation.
Proof of the lemma. For convenience, denote $s_0=x_{\sigma(1)}\ldots x_{\sigma(k+i-1)}$ and $s=x_{\sigma(1)}\ldots x_{\sigma(n)}$. Note after swapping, $f(p)$ is not changed for prefix $p$ of $s$ such that $|p|\le |s_0|$ or $|p|\ge |s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)}|$, we need only to show $s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)}$ is potentially accepted. Note
$s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)}$ is potentially accepted, we have
$$
\begin{align}
g(s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)})\le f(s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)})
\end{align}
$$
or
$$
\begin{align}
g(s_0)+f(x_{\sigma(k+i)})+f(x_{\sigma(k+i+1)})&\le f(s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)}),\\
g(x_{\sigma(k+i)})+f(x_{\sigma(k+i+1)})&\le f(s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)}),\\
g(x_{\sigma(k+i+1)})&\le f(s_0x_{\sigma(k+i)}x_{\sigma(k+i+1)}),
\end{align}
$$
or
$$
\begin{align}
g(s_0)+f(x_{\sigma(k+i+1)})+f(x_{\sigma(k+i)})&\le f(s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)}),\\
g(x_{\sigma(k+i+1)})+f(x_{\sigma(k+i)})<g(x_{\sigma(k+i+1)}) &\le f(s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)}),\\
g(x_{\sigma(k+i)})\le g(x_{\sigma(k+i+1)})&\le f(s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)}),
\end{align}
$$
which means
$$
\begin{align}
g(s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)})\le f(s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)}).
\end{align}
$$
Hence $s_0x_{\sigma(k+i+1)}x_{\sigma(k+i)}$ is potentially accepted.
Context
StackExchange Computer Science Q#96221, answer score: 2
Revisions (0)
No revisions yet.