HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Regularity of language of words containing a square

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
containingregularitywordslanguagesquare

Problem

$$L = \{w\mid w\text{ contains a substring of form }yy\text{, where }y\text{ is any non-empty string}\}.$$ Is this language regular? We do not know what $y$ looks like in advance. And why is this language regular or not regular?

Solution

Slightly more formally, for an alphabet $\Sigma$, the language of repetitive strings over $\Sigma$ is defined as follows.
$$ RR = \{w: w=uyyv \text{ for some }u, y, v \in \Sigma^*, y \text{ is not empty}\}$$ The question is, is RR a regular language?

There are three cases.

  • $\Sigma$ has one symbol.



WLOG, let $\Sigma=\{a\}$. Then $RR = aaa^*$ is a regular language.

  • $\Sigma$ has two symbols.



WLOG, let $\Sigma=\{a, b\}$. Suppose $w\in RR$ does not contain $aa$ or $bb$. That is, if an $a$ in $w$ is followed by another symbol, that symbol must be $b$; if a $b$ in $w$ is followed by another symbol, that symbol must be $a$. That means, if $w$ starts with $a$, it must contain $abab$; otherwise, it must contain $baba$. So any string in $RR$ must have one of $aa, bb, abab$ and $baba$ as its substring. Obviously, any string that contain one of those four strings is a repetitive string. So we can write $RR = (a+b)^\ast(aa+bb+abab+baba)(a+b)^\ast$ as a regular expression. Hence, $RR$ is regular.

  • $\Sigma$ has 3 or more symbols. This is the most interesting case.



WLOG, let $\Sigma\supseteq\{a,b,c\}$. We know there is an infinite sequence $u$ of $a,b,c$'s that has no consecutive repeated substring. Let $u_k$ be the first $k$ symbols in $u$. For example, if $u=abcabacabcb\cdots$, then $u_1=a, u_2=ab, \cdots, u_6=abcaba, \cdots$.
Let $i

In the analysis for the last case, we have used one of Alex Thue's results on non-repetitive string over three symbols. He showed that there are many such strings of infinite length. One of them is given by the following procedure. Let $\Sigma = \{a,b,c\}$. Let $A_0 = a$ and $\phi : a \mapsto abcab, b\mapsto acabcb, c\mapsto acbcacb$. So
$$\begin{aligned}
A_0 = a& \\
\phi(A_0)= a&bcab\\
\phi^2(A_0)= a&bcab\ acabcb\ acbcacb\ abcab\ acabcb\\
\phi^3(A_0)= a&bcab\ acabcb\ acbcacb\ abcab\ acabcb \\
a&bcab\ acbcacb\ abcab\ acabcb\ acbcacb\ acabcb\\
a&bcab\ acbcacb\ acabcb\ acbcacb\ abcab\ acbcacb\ acabcb\\
a&bcab\ acabcb\ acbcacb\ abcab\ acabcb\\
a&bcab\ acbcacb\ abcab\ acabcb\ acbcacb\ acabcb\\
\cdots\end{aligned}
$$
Note that $A_0$ is a prefix of $\phi(A_0)$. By a very easy mathematical induction, we can show that $\phi^n(A_0)$ is a prefix of $\phi(\phi^{n}(A_0))=\phi^{n+1}(A_0)$. (Thinking about this recurrence relation, I can feel the logic beauty of self reference. That is why I cannot help but writing down this example.) So we can specify a unique infinite string, $\omega$ whose prefix can be $\phi^n(A_0)$ for any $n$. $\omega$ is the wanted string. For a proof that $\omega$ is a string without consecutive repeated substrings, reader can check Axel Thue, Uber die gegenseitige Lage gleicher Teile gewisser Zeichenreihen; Norske Vid. Skrifter
I Mat.-Nat. Kl.; Christiania; 1912 page 1–67.

In fact, as HendrikJan pointed out, the language of repetitive strings over three or more symbols is not context-free. This fact is proved in Repetitive strings are not context-free by R.Ross and K.Winklmann, RAIRO - Theoretical Informatics and Applications, 16 (1982) 191-199.

Since the complement of a regular language is also a regular language, the complement of the language of repetitive strings is also regular, too. In fact, if the alphabet has one or two symbols, they are finite languages. All non-repetitive strings over $\{a\}$ are $\{\epsilon, a\}$. All non-repetitive strings over $\{a,b\}$ are $\{\epsilon, a, b, ab, ba, aba, bab\}$.

Exercise. Show that an infinite language that does not contain a square is not context-free.

Question. Is the language of almost repetitive strings defined below regular? If no regular, is it context-free?
$$ RR = \{w: w=uyxyv \text{ for some }u, y, v \in \Sigma^*, y \text{ is not empty}, x\in\Sigma\}$$

Context

StackExchange Computer Science Q#98630, answer score: 6

Revisions (0)

No revisions yet.