HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Regular expressions and 'capturing parentheses' with 'backreferences'

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
capturingparenthesesbackreferenceswithregularexpressionsand

Problem

We know that regular expressions (RE) are implemented with finite automata (FA). In some language (like JavaScript) in RE there are features like 'capturing parenthesis' with 'backreferences':

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#special-capturing-parentheses

(x)
Matches 'x' and remembers the match, as the following example shows. The parentheses are called capturing parentheses.
The '(foo)' and '(bar)' in the pattern /(foo) (bar) \1 \2/ match and remember the first two words in the string "foo bar foo bar". The \1 and \2 in the pattern match the string's last two words.

I want to know if this pattern /(foo) (bar) \1 \2/ is in fact a RE according the definition of RE that we have in theoretical formal language or it is something more powerful. And if it is so I would like to know if this kind of feature is implemented also with FA or in another way (in particualr how it is implemented).

Solution

These extended notions of regular expressions capture more than just the regular languages. For example, ([ab])\1 matches the language $\{ww\mid w\in\{a,b\}^\}$, which isn't regular and isn't even context-free (Example 2.38 of Sipser, Introduction to the Theory of Computation, 3rd edition).

"Regular" expressions that don't match regular langauges can't be translated to finite automata, since finite automata match only the regular languages. A side-effect of this is that many libraries don't even try to compile to automata, which can lead to extremely slow matching, even when a "regular" expression is a true regular expression. Russ Cox wrote an excellent article about this, which goes into a lot of the history, too.

Context

StackExchange Computer Science Q#69208, answer score: 9

Revisions (0)

No revisions yet.