patternMinor
Regular expressions and 'capturing parentheses' with 'backreferences'
Viewed 0 times
capturingparenthesesbackreferenceswithregularexpressionsand
Problem
We know that regular expressions (RE) are implemented with finite automata (FA). In some language (like JavaScript) in RE there are features like 'capturing parenthesis' with 'backreferences':
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#special-capturing-parentheses
(x)
Matches 'x' and remembers the match, as the following example shows. The parentheses are called capturing parentheses.
The '(foo)' and '(bar)' in the pattern /(foo) (bar) \1 \2/ match and remember the first two words in the string "foo bar foo bar". The \1 and \2 in the pattern match the string's last two words.
I want to know if this pattern
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#special-capturing-parentheses
(x)
Matches 'x' and remembers the match, as the following example shows. The parentheses are called capturing parentheses.
The '(foo)' and '(bar)' in the pattern /(foo) (bar) \1 \2/ match and remember the first two words in the string "foo bar foo bar". The \1 and \2 in the pattern match the string's last two words.
I want to know if this pattern
/(foo) (bar) \1 \2/ is in fact a RE according the definition of RE that we have in theoretical formal language or it is something more powerful. And if it is so I would like to know if this kind of feature is implemented also with FA or in another way (in particualr how it is implemented).Solution
These extended notions of regular expressions capture more than just the regular languages. For example,
"Regular" expressions that don't match regular langauges can't be translated to finite automata, since finite automata match only the regular languages. A side-effect of this is that many libraries don't even try to compile to automata, which can lead to extremely slow matching, even when a "regular" expression is a true regular expression. Russ Cox wrote an excellent article about this, which goes into a lot of the history, too.
([ab])\1 matches the language $\{ww\mid w\in\{a,b\}^\}$, which isn't regular and isn't even context-free (Example 2.38 of Sipser, Introduction to the Theory of Computation, 3rd edition)."Regular" expressions that don't match regular langauges can't be translated to finite automata, since finite automata match only the regular languages. A side-effect of this is that many libraries don't even try to compile to automata, which can lead to extremely slow matching, even when a "regular" expression is a true regular expression. Russ Cox wrote an excellent article about this, which goes into a lot of the history, too.
Context
StackExchange Computer Science Q#69208, answer score: 9
Revisions (0)
No revisions yet.