patternMinor
Smallest NFA accepting concatenations of two words of the length $k$ which are different at all positions
Viewed 0 times
smallestpositionsthenfaallarelengthwordsconcatenationsdifferent
Problem
Let $k\in \mathbb N$
I'm looking for a small NFA build for the language of concatenation of two words of the length $k$ which are index-wise different, i.e. $$L_k=\{u\cdot v \in \Sigma^* : |u|=|v|=k\wedge \forall i, u_i\neq v_i\}$$
Notice that since $k$ is fixed, $|L_k|=(|\Sigma|\cdot(|\Sigma|-1))^k$, and is regular as a finite language.
$%$
The trivial DFA for the language contains $k$$|\Sigma| \choose k$$+1$ states and just "remembers" which letters it have seen during the first $k$ letters, however if $k=o(|\Sigma|)$, we can create a significantly smaller NFA.
$%$
A "simple" NFA for it would be of size $O^*(2^{2k})$ (more precisely, $O(k^2 \log |\Sigma| 2^{2k+O(\log^2 k)})$ ):
$%$
-
Take a $(|\Sigma|,2k)$-universal set (i.e. a set of vectors $\mathcal V\subseteq \{0,1\}^{|\Sigma|}$ such that for every vector $u\in \{0,1\}^2k$ and a $2k$ indices, there exists a vector $v\in \mathcal V$ such that $v_{|I}=u$, i.e. if we look only at these $k$ indices in $v$ we find $u$). Such families of size $O^*(2^{2k})$ are known.
We think of each vector $v$ as a function $\Sigma\to \{0,1\}$, where $v(\sigma_i)=1⟺v_i=1$.
$%$
-
Create an NFA as follows:
-
From the starting state $q_0$ create an $\epsilon$-transition to a new state for any $v\in \mathcal V$. Denote this state by $q_v$.
-
From each $q_v$ build a path of states which accepts all words whose first $k$ letters are mapped by $v$ to 0, and the later $k$ letters are mapped by $v$ to 1.
$%$
Basically, the idea is that the universal set allowes us to partition the letters which are allowed to appear in the first $k$ symbols from the rest, and we get every word in the language since there has to be a corresponding vector $v\in \mathcal V$ which partitions it right.
So the questions are:
What is the size of the smallest NFA for $L$?
Is this construction optimal?
What lower bound we can prove for such automaton size?
I'm looking for a small NFA build for the language of concatenation of two words of the length $k$ which are index-wise different, i.e. $$L_k=\{u\cdot v \in \Sigma^* : |u|=|v|=k\wedge \forall i, u_i\neq v_i\}$$
Notice that since $k$ is fixed, $|L_k|=(|\Sigma|\cdot(|\Sigma|-1))^k$, and is regular as a finite language.
$%$
The trivial DFA for the language contains $k$$|\Sigma| \choose k$$+1$ states and just "remembers" which letters it have seen during the first $k$ letters, however if $k=o(|\Sigma|)$, we can create a significantly smaller NFA.
$%$
A "simple" NFA for it would be of size $O^*(2^{2k})$ (more precisely, $O(k^2 \log |\Sigma| 2^{2k+O(\log^2 k)})$ ):
$%$
-
Take a $(|\Sigma|,2k)$-universal set (i.e. a set of vectors $\mathcal V\subseteq \{0,1\}^{|\Sigma|}$ such that for every vector $u\in \{0,1\}^2k$ and a $2k$ indices, there exists a vector $v\in \mathcal V$ such that $v_{|I}=u$, i.e. if we look only at these $k$ indices in $v$ we find $u$). Such families of size $O^*(2^{2k})$ are known.
We think of each vector $v$ as a function $\Sigma\to \{0,1\}$, where $v(\sigma_i)=1⟺v_i=1$.
$%$
-
Create an NFA as follows:
-
From the starting state $q_0$ create an $\epsilon$-transition to a new state for any $v\in \mathcal V$. Denote this state by $q_v$.
-
From each $q_v$ build a path of states which accepts all words whose first $k$ letters are mapped by $v$ to 0, and the later $k$ letters are mapped by $v$ to 1.
$%$
Basically, the idea is that the universal set allowes us to partition the letters which are allowed to appear in the first $k$ symbols from the rest, and we get every word in the language since there has to be a corresponding vector $v\in \mathcal V$ which partitions it right.
So the questions are:
What is the size of the smallest NFA for $L$?
Is this construction optimal?
What lower bound we can prove for such automaton size?
Solution
Update: this doesn't answer the question that the original poster was looking for, and doesn't help with the general case where $|\Sigma|>2$, so the question remains open.
Here is a simpler construction, showing that in the case where $\Sigma=\{0,1\}$, $O(k \cdot 2^{2k})$ states suffice, and in fact you can recognize the language with a DFA with that many states.
Let's start by looking at the complement language:
$$\overline{L_k} = \{u \cdot v \in \Sigma^* : |u| = |v| = k, \exists i . u_i = v_i\}.$$
Notice that this can be recognized by a NFA with $2k^2 |\Sigma|$ states. The NFA first guesses $i$ and a symbol $x$, then accepts all strings $u \cdot v$ where $u_i = v_i = x$.
Convert this to a DFA using the standard subset construction. We get a DFA with $\le 2^{|\Sigma| k + 1} k$ states. Now we can compute its complement, which is another DFA with the same number of states that recognizes $L_k$. Since it is a DFA, it is in turn automatically a NFA, too.
This beats your construction for the case where $|\Sigma|=2$, but otherwise leads a NFA that's larger than your scheme. However, I thought it might be of interest, both because it is so simple, and because it in fact provides a DFA (not just a NFA) to recognize your language $L_k$.
Here is a simpler construction, showing that in the case where $\Sigma=\{0,1\}$, $O(k \cdot 2^{2k})$ states suffice, and in fact you can recognize the language with a DFA with that many states.
Let's start by looking at the complement language:
$$\overline{L_k} = \{u \cdot v \in \Sigma^* : |u| = |v| = k, \exists i . u_i = v_i\}.$$
Notice that this can be recognized by a NFA with $2k^2 |\Sigma|$ states. The NFA first guesses $i$ and a symbol $x$, then accepts all strings $u \cdot v$ where $u_i = v_i = x$.
Convert this to a DFA using the standard subset construction. We get a DFA with $\le 2^{|\Sigma| k + 1} k$ states. Now we can compute its complement, which is another DFA with the same number of states that recognizes $L_k$. Since it is a DFA, it is in turn automatically a NFA, too.
This beats your construction for the case where $|\Sigma|=2$, but otherwise leads a NFA that's larger than your scheme. However, I thought it might be of interest, both because it is so simple, and because it in fact provides a DFA (not just a NFA) to recognize your language $L_k$.
Context
StackExchange Computer Science Q#28457, answer score: 4
Revisions (0)
No revisions yet.