HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Smallest NFA accepting concatenations of two words of the length $k$ which are different at all positions

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
smallestpositionsthenfaallarelengthwordsconcatenationsdifferent

Problem

Let $k\in \mathbb N$

I'm looking for a small NFA build for the language of concatenation of two words of the length $k$ which are index-wise different, i.e. $$L_k=\{u\cdot v \in \Sigma^* : |u|=|v|=k\wedge \forall i, u_i\neq v_i\}$$

Notice that since $k$ is fixed, $|L_k|=(|\Sigma|\cdot(|\Sigma|-1))^k$, and is regular as a finite language.

$%$

The trivial DFA for the language contains $k$$|\Sigma| \choose k$$+1$ states and just "remembers" which letters it have seen during the first $k$ letters, however if $k=o(|\Sigma|)$, we can create a significantly smaller NFA.

$%$

A "simple" NFA for it would be of size $O^*(2^{2k})$ (more precisely, $O(k^2 \log |\Sigma| 2^{2k+O(\log^2 k)})$ ):

$%$

-
Take a $(|\Sigma|,2k)$-universal set (i.e. a set of vectors $\mathcal V\subseteq \{0,1\}^{|\Sigma|}$ such that for every vector $u\in \{0,1\}^2k$ and a $2k$ indices, there exists a vector $v\in \mathcal V$ such that $v_{|I}=u$, i.e. if we look only at these $k$ indices in $v$ we find $u$). Such families of size $O^*(2^{2k})$ are known.

We think of each vector $v$ as a function $\Sigma\to \{0,1\}$, where $v(\sigma_i)=1⟺v_i=1$.

$%$

-
Create an NFA as follows:

-
From the starting state $q_0$ create an $\epsilon$-transition to a new state for any $v\in \mathcal V$. Denote this state by $q_v$.

-
From each $q_v$ build a path of states which accepts all words whose first $k$ letters are mapped by $v$ to 0, and the later $k$ letters are mapped by $v$ to 1.

$%$

Basically, the idea is that the universal set allowes us to partition the letters which are allowed to appear in the first $k$ symbols from the rest, and we get every word in the language since there has to be a corresponding vector $v\in \mathcal V$ which partitions it right.

So the questions are:


What is the size of the smallest NFA for $L$?


Is this construction optimal?


What lower bound we can prove for such automaton size?

Solution

Update: this doesn't answer the question that the original poster was looking for, and doesn't help with the general case where $|\Sigma|>2$, so the question remains open.

Here is a simpler construction, showing that in the case where $\Sigma=\{0,1\}$, $O(k \cdot 2^{2k})$ states suffice, and in fact you can recognize the language with a DFA with that many states.

Let's start by looking at the complement language:

$$\overline{L_k} = \{u \cdot v \in \Sigma^* : |u| = |v| = k, \exists i . u_i = v_i\}.$$

Notice that this can be recognized by a NFA with $2k^2 |\Sigma|$ states. The NFA first guesses $i$ and a symbol $x$, then accepts all strings $u \cdot v$ where $u_i = v_i = x$.

Convert this to a DFA using the standard subset construction. We get a DFA with $\le 2^{|\Sigma| k + 1} k$ states. Now we can compute its complement, which is another DFA with the same number of states that recognizes $L_k$. Since it is a DFA, it is in turn automatically a NFA, too.

This beats your construction for the case where $|\Sigma|=2$, but otherwise leads a NFA that's larger than your scheme. However, I thought it might be of interest, both because it is so simple, and because it in fact provides a DFA (not just a NFA) to recognize your language $L_k$.

Context

StackExchange Computer Science Q#28457, answer score: 4

Revisions (0)

No revisions yet.