HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMajor

"Dense" regular expressions generate $\Sigma^*$?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
regularexpressionsgeneratesigmadense

Problem

Here's a conjecture for regular expressions:


For regular expression $R$, let the length $|R|$ be the number of symbols in it,
ignoring parentheses and operators. E.g. $|0 \cup 1| = |(0 \cup 1)^*| = 2$


Conjecture: If $|R| > 1$ and $L(R)$ contains every string of length $|R|$ or less, then $L(R) = \Sigma^*$.

That is, if $L(R)$ is 'dense' up to $R$'s length, then $R$ actually generates everything.

Some things that may be relevant:

  • Only a small part of $R$ is needed to generate all strings. For example in binary, $R = (0 \cup 1)^* \cup S$ will work for any $S$.



  • There needs to be a Kleene star in $R$ at some point. If there isn't, it will miss some string of size less than $|R|$.



It would be nice to see a proof or counterexample. Is there some case where it's obviously wrong that I missed? Has anyone seen this (or something similar) before?

Solution

Your conjecture is disproved by Keith Ellul, Bryan Krawetz, Jeffrey Shallit and Ming-wei Wang in their paper "Regular Expressions: New Results and Open Problems". While the paper is not available on-line, a talk is.

In the paper, they define the measure $|\mathrm{alph}(R)|$, which is the number of symbols in $R$, not counting $\epsilon$ or $\emptyset$. However, $\emptyset$ can be eliminated from every expression not generating the empty language, and the expression can be "cleaned up" so that the number of $\epsilon$ it contains is at most $|\mathrm{alph}(R)|$ (Lemma on page 10 of the talk).

In page 51, for every $n \geq 3$ they construct a regular expression of size $O(n)$ over $\{0,1\}$ which generates all strings of size at most $\Omega(2^n n)$, but doesn't generate all strings. Note that "size" here is both in your sense and theirs, since we're using big-O notation. They also pose an open question to find the best dependence between the two parameters.

Context

StackExchange Computer Science Q#1511, answer score: 35

Revisions (0)

No revisions yet.