HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Trying multiple regexes against a single string

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
regexestryingagainstsinglemultiplestring

Problem

I have a huge list of regexes (>1,000 but

  • Is there a neater way of finding just the named capture group that matched than what I've done with groupdict()?



-
Are there any more gotchas than the obvious one of two 'rules' each containing the same group name? e.g.:

rules = [("(?Pfoobar)", "Hit a foobar"),
         ("(?Pfoob.z)", "Frobination")]


(The issue of a single syntax error in a single 'rule' killing the whole thing is easy enough to workaround by validating the inputs at rule creation time)

Solution


  • I think it's a neat idea, because you're indeed using well-tested code, which reduces the chance of errors.



  • Looking at the re API, you do need to retrieve all possibles matches using groupdict().



  • Even that one is not a gotcha since you're naming your groups yourself. Right? I can't think of anything else.



Other comments:

  • You have a small bug in your last loop where text and hit need to be exchanged.



  • There's no easy to way to be sure that the DFA version will be faster than the normal version. Since you're not producing a factorized DFA but a sum of DFAs, naive code could be as slow as testing each regex one by one. Of course it's possible that it's much faster, but measure it if this is what you want to achieve!



  • You don't need uuids, a simple counter would be enough. Eg ?P instead of ?P.

Context

StackExchange Code Review Q#40607, answer score: 5

Revisions (0)

No revisions yet.