patternpythonMinor
Trying multiple regexes against a single string
Viewed 0 times
regexestryingagainstsinglemultiplestring
Problem
I have a huge list of regexes (>1,000 but
-
Are there any more gotchas than the obvious one of two 'rules' each containing the same group name? e.g.:
(The issue of a single syntax error in a single 'rule' killing the whole thing is easy enough to workaround by validating the inputs at rule creation time)
- Is there a neater way of finding just the named capture group that matched than what I've done with
groupdict()?
-
Are there any more gotchas than the obvious one of two 'rules' each containing the same group name? e.g.:
rules = [("(?Pfoobar)", "Hit a foobar"),
("(?Pfoob.z)", "Frobination")](The issue of a single syntax error in a single 'rule' killing the whole thing is easy enough to workaround by validating the inputs at rule creation time)
Solution
- I think it's a neat idea, because you're indeed using well-tested code, which reduces the chance of errors.
- Looking at the
reAPI, you do need to retrieve all possibles matches usinggroupdict().
- Even that one is not a gotcha since you're naming your groups yourself. Right? I can't think of anything else.
Other comments:
- You have a small bug in your last loop where
textandhitneed to be exchanged.
- There's no easy to way to be sure that the DFA version will be faster than the normal version. Since you're not producing a factorized DFA but a sum of DFAs, naive code could be as slow as testing each regex one by one. Of course it's possible that it's much faster, but measure it if this is what you want to achieve!
- You don't need uuids, a simple counter would be enough. Eg
?Pinstead of?P.
Context
StackExchange Code Review Q#40607, answer score: 5
Revisions (0)
No revisions yet.