HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Can POSIX BRE express all regular languages?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
canexpressalllanguagesregularposixbre

Problem

It appears that "Basic Regular Expressions" as defined by POSIX.1-2008 do not support alternation, a|b (although some grep implementations recognize the escaped version, \|).

Since the regular languages are closed under union by definition, does this mean that POSIX BRE has less expressive power than a finite automaton? Or is there some way to simulate alternation using other constructs?

Solution

Indeed the POSIX BRE language cannot express all regular expressions because it lacks alternation. It can't even recognize all finite languages, let alone all regular languages.

For example, $\{ab, ba\}$ is not recognizable as a BRE. To prove this, consider what the toplevel syntactic form could be:

  • It can't be one of the single-character forms since the language has words of length $\gt 1$.



  • It can't be $R^*$ because that would match the empty string.



  • It can't be $R^{\{m,n\}}$ except with $m=n=1$ (in which case we're back to the original problem) because that would match strings of different lengths or the empty string.



  • So it has to be concatenation: $R_1 R_2$. Now consider how $ab$ is recognized:



  • If $R_1$ recognizes $ab$ then $R_2$ must not recognize anything anything other than the empty string. So $R_1$ must recognize $\{ab,ba\}$ and we're back to the original problem.



  • If $R_1$ recognizes $a$ but not $ab$ then $R_2$ must recognize $b$. But then $R_1R_2$ recognizes all words of the form $u b$ where $R_1$ recognizes $u$, so $R_1$ must not recognize anything other than $a$. There's no way to recognize $ba$.



  • If $R_1$ recognizes neither $ab$ nor $a$ then the only way for $R$ to recognize $ab$ is if $R_1$ recognizes the empty string, in which case we're back to the original problem as above, but for $R_2$ this time.



When “we're back to the original problem”, that means that the only solution to find a BRE recognizes the language is to find a smaller BRE that has the same property. This is an infinite descent, so there's no BRE that has the desired property.

I don't think there's a “nice” characterization of BRE-recognizable languages, for example as languages recognizable by a “nice” class of automata.

Note that BRE-recognizable languages are actually not a subclass of regular languages, since backreferences add expressive power. For example $\{w w \mid w \in \{a,b\}^\}$ is recognized by the BRE \(.\)\1 but is famously not regular. BRE without backreferences are just syntactic sugar over regular expressions so the languages they can recognize are a subclass of the regular languages.

Context

StackExchange Computer Science Q#62944, answer score: 18

Revisions (0)

No revisions yet.