patternMinor
Is the language of pairs of words of equal length whose hamming distance is 2 or greater context-free?
Viewed 0 times
thewhosefreeequallengthwordsgreaterdistancelanguagecontext
Problem
Is the following language context free?
$$L = \{ uxvy \mid u,v,x,y \in \{ 0,1 \}^+, |u| = |v|, u \neq v, |x| = |y|, x \neq y\} $$
As pointed out by sdcvvc, a word in this language can also be described as the concatenation of two words of the same length the hamming distance of which is 2 or greater.
I think it's not context free but I'm having a hard time proving it. I tried intersecting this language with a regular language (like $ \ 0^1^0^1^ $ for example) then use the pumping lemma and \ or homomorphisms but I always get a language that is too complicated to characterize and write down.
$$L = \{ uxvy \mid u,v,x,y \in \{ 0,1 \}^+, |u| = |v|, u \neq v, |x| = |y|, x \neq y\} $$
As pointed out by sdcvvc, a word in this language can also be described as the concatenation of two words of the same length the hamming distance of which is 2 or greater.
I think it's not context free but I'm having a hard time proving it. I tried intersecting this language with a regular language (like $ \ 0^1^0^1^ $ for example) then use the pumping lemma and \ or homomorphisms but I always get a language that is too complicated to characterize and write down.
Solution
Note [2019-07-30] The proof is wrong ... the question is more complicated than it sounds.
After a failed attempt here it is another idea.
If we intersect $L$ with the regular language $L_{reg} = 0^10^10^10^$ we get a CF language.
Perhaps we can have more luck if we use $L_{reg}' = 0^10^10^10^10^*$ (a string with exactly 4 1s).
Let $L_1 = L \cap L_{reg}'$, informally $w \in L_1$ if it can be split in two halves, such that one half contains exactly $\{0,1,3,4\}$ $1s$ or both halves contain two $1$s but their positions don't match.
Suppose that $L_1$ is CF and let $G$ be its grammar in Chomsky normal form, and let
$$w = uv = 0^a 1 0^b 1 0^c 1 0^d 1 0^e \in L_1$$
We have $|u|=|v|$ (even length) and $d(u,v) \geq 2$
If we restrict our attention to the ways in which the four 1s of $w$ can be generated we have the three cases shown at the top of the figure 1. The central part of the figure 1 shows the first case (but the others are similar).
Figure 1 (the full picture can be downloaded here)
If we pick $a=e, c=2a$ and $b,d \gg a$ we see that the zeros between the two pairs of 1s must be independently pumpable (red nodes in the figure): in particular, for large enough $b \gg a$, we get a duplicate nonterminal node on a internal subtree (node X in figure 2) or a repeated subsequence in the path towards the first or the second 1 (node Y in figure 2). Note that Figure 2 is a little bit simplified: there can be more nonterminal nodes between the two $X$s, and also between the two $Ys$ ($Y\to ... \to Z_i \to ... Y$ but with $Z_i$ that produces only 0s on the right of the first 1).
Figure 2
So we can fix an arbitrary $a = e = k, c = 2a$, then pick large enough $b$ to get an independently pumpable node on the sequence of zeros between first and second $1$. For the sequence of zeros between the third and fourth 1, we can choose $d = b! +b$.
But $0^b$ is independtly pumpable so there is a $p \leq b$ pumpable substring $y$, i.e. such that $b = xyz, |y|=p, |x|\geq 0, |z|\geq 0$ and $xy^iz = b!+b$. The string we get is:
$$w' = 0^k 1 0^{b!+b} 1 0^{2k} 1 0^{b!+b} 1 0^k$$
but $w' \notin L_1$. Thus $L_1$ is not CF and finally $L$ is not CF.
If the proof is correct (???) it can be extended to every language $L_k = \{ uv : |u|=|v|, d(u,v)\geq k\}, k\geq 2$
After a failed attempt here it is another idea.
If we intersect $L$ with the regular language $L_{reg} = 0^10^10^10^$ we get a CF language.
Perhaps we can have more luck if we use $L_{reg}' = 0^10^10^10^10^*$ (a string with exactly 4 1s).
Let $L_1 = L \cap L_{reg}'$, informally $w \in L_1$ if it can be split in two halves, such that one half contains exactly $\{0,1,3,4\}$ $1s$ or both halves contain two $1$s but their positions don't match.
Suppose that $L_1$ is CF and let $G$ be its grammar in Chomsky normal form, and let
$$w = uv = 0^a 1 0^b 1 0^c 1 0^d 1 0^e \in L_1$$
We have $|u|=|v|$ (even length) and $d(u,v) \geq 2$
If we restrict our attention to the ways in which the four 1s of $w$ can be generated we have the three cases shown at the top of the figure 1. The central part of the figure 1 shows the first case (but the others are similar).
Figure 1 (the full picture can be downloaded here)
If we pick $a=e, c=2a$ and $b,d \gg a$ we see that the zeros between the two pairs of 1s must be independently pumpable (red nodes in the figure): in particular, for large enough $b \gg a$, we get a duplicate nonterminal node on a internal subtree (node X in figure 2) or a repeated subsequence in the path towards the first or the second 1 (node Y in figure 2). Note that Figure 2 is a little bit simplified: there can be more nonterminal nodes between the two $X$s, and also between the two $Ys$ ($Y\to ... \to Z_i \to ... Y$ but with $Z_i$ that produces only 0s on the right of the first 1).
Figure 2
So we can fix an arbitrary $a = e = k, c = 2a$, then pick large enough $b$ to get an independently pumpable node on the sequence of zeros between first and second $1$. For the sequence of zeros between the third and fourth 1, we can choose $d = b! +b$.
But $0^b$ is independtly pumpable so there is a $p \leq b$ pumpable substring $y$, i.e. such that $b = xyz, |y|=p, |x|\geq 0, |z|\geq 0$ and $xy^iz = b!+b$. The string we get is:
$$w' = 0^k 1 0^{b!+b} 1 0^{2k} 1 0^{b!+b} 1 0^k$$
but $w' \notin L_1$. Thus $L_1$ is not CF and finally $L$ is not CF.
If the proof is correct (???) it can be extended to every language $L_k = \{ uv : |u|=|v|, d(u,v)\geq k\}, k\geq 2$
Context
StackExchange Computer Science Q#11585, answer score: 7
Revisions (0)
No revisions yet.