HiveBrain v1.2.0
Get Started
← Back to all entries
principleMinor

Compare regex in programming languages with regular expression from automata/formal language?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
expressionfromlanguageswithprogrammingregularautomataformallanguageregex

Problem

I'm trying to reconcile the differences/similarities between regular expression from formal language theory and automata, and the "regex" offered by programming languages.

These two differ not just in "syntax" (the union + in regular expressions means | in regex, and there's no such meta-character as . in regular expressions, etc.), but also in "semantics", e.g. the regular expression (a+b)c describes the regular language, L={a,b}{c}, which the string abcd is not a member of, but the equivalent regex (a|b)c would actually find a "match" in the string abcd (unless I do (a|b)c$). Not to mention that "regex" can describe some non-regular languages as well.

Is there a good resource/summary that compares regular expression with "regex"?

Solution

In programming, "regular expressions" are used for searching (if a match happens in the string), not for simple accepting. I.e., it is as if what is written is .your-re-here..

It is often of interest to know not only if it matched, but exactly where. And so the above isn't all the story, most RE machines also look for the longest match. For simplicity, many RE libraries don't handle alternation (e.g. ed(1) in Unix of yore didn't), as characters, character classes, star applied to the last character, and anchors (beginning/end of line) were enough to handle 90% or more of the "text editing" requirements. And that can be written in simple, short code.

Furthermore, depending on the exact machinery used, it is more or less easy to extend the "regular expressions" with non-regular constructs, like "match what matched between parenteses before", like ([a-z]*) \1 to mean "the same word twice, separated by a space". Such constructs are handled e.g. in vi(1), as tasks like the mentioned are useful to catch typos (a common error is to to write the same word twice).

Context

StackExchange Computer Science Q#53397, answer score: 5

Revisions (0)

No revisions yet.