patternpythonMinor
Filtering with multiple inclusion and exclusion patterns
Viewed 0 times
inclusionexclusionwithpatternsmultipleandfiltering
Problem
I have a requirement to be able to filter a list of strings by both inclusion and exclusion patterns (using
For example, given the set of values:
And these inclusion filters:
It should return:
It should also support using exclusion filters, e.g.
Lastly, using both inclusion and exclusion filters, the exclusion filters should take precedence in case of a conflict.
Here is my code and tests:
-
-
```
import unittest
from extrafilters import superFilter, multiFilter
class multiFilterTests(unittest.TestCase):
def setUp(self):
self.names = ['a1', 'a2', 'a3', 'b1', 'b2', 'b3', 'c1', 'c2', 'c3']
def test_patterns(self):
patterns = ['a', 'b']
expected = ['a1', 'a2', 'a3', 'b1', 'b2', 'b3
fnmatch-style wildcards), of which there can be many.For example, given the set of values:
['a1', 'a2', 'a3', 'b1', 'b2', 'b3', 'c1', 'c2', 'c3']And these inclusion filters:
['a*', 'b*']It should return:
['a1', 'a2', 'a3', 'b1', 'b2', 'b3']It should also support using exclusion filters, e.g.
['c*'] to get the same result.Lastly, using both inclusion and exclusion filters, the exclusion filters should take precedence in case of a conflict.
Here is my code and tests:
-
extrafilters.pyimport fnmatch
def superFilter(names, inclusion_patterns=[], exclusion_patterns=[]):
"""Enhanced version of fnmatch.filter() that accepts multiple inclusion and exclusion patterns.
If only inclusion_patterns is specified, only the names which match one or more patterns are returned.
If only exclusion_patterns is specified, only the names which do not match any pattern are returned.
If both are specified, the exclusion patterns take precedence.
If neither is specified, the input is returned as-is."""
included = multiFilter(names, inclusion_patterns) if inclusion_patterns else names
excluded = multiFilter(names, exclusion_patterns) if exclusion_patterns else []
return set(included) - set(excluded)
def multiFilter(names, patterns):
"""Generator function which yields the names that match one or more of the patterns."""
for name in names:
for pattern in patterns:
if fnmatch.fnmatch(name, pattern):
yield name-
extrafilters_test.py```
import unittest
from extrafilters import superFilter, multiFilter
class multiFilterTests(unittest.TestCase):
def setUp(self):
self.names = ['a1', 'a2', 'a3', 'b1', 'b2', 'b3', 'c1', 'c2', 'c3']
def test_patterns(self):
patterns = ['a', 'b']
expected = ['a1', 'a2', 'a3', 'b1', 'b2', 'b3
Solution
You can use
This still allows an element to be yielded several times, so instead you might want to do:
You can also avoid creating a second set by using
I would probably stick with the original, though. I only mention it because a fair number of people don't know about it and sometimes it does matter.
Your naming breaks PEP8. Don't let
A quick touch-up is
I would also consider changing the description; it's obvious from introspection that
Your tests cover the general case fine but they don't check edge-cases; what happens with 0-length inputs? What about complicated patterns? Do you ever check precedence, despite having mentioned it?
fnmatch.filter to simplify multiFilter:for pattern in patterns:
for name in fnmatch.filter(names, pattern):
yield nameThis still allows an element to be yielded several times, so instead you might want to do:
for name in names:
if any(fnmatch.fnmatch(name, pattern) for pattern in patterns):
yield nameYou can also avoid creating a second set by using
.difference:return set(included).difference(excluded)I would probably stick with the original, though. I only mention it because a fair number of people don't know about it and sometimes it does matter.
Your naming breaks PEP8. Don't let
unittest fool you; unittest was written by heathens before the style guide was standardized. A quick touch-up is
import fnmatch
def super_filter(names, inclusion_patterns=[], exclusion_patterns=[]):
"""
Enhanced version of fnmatch.filter() that accepts multiple inclusion and exclusion patterns.
If only inclusion_patterns is specified, only the names which match one or more patterns are returned.
If only exclusion_patterns is specified, only the names which do not match any pattern are returned.
If both are specified, the exclusion patterns take precedence.
If neither is specified, the input is returned as-is.
"""
included = multi_filter(names, inclusion_patterns) if inclusion_patterns else names
excluded = multi_filter(names, exclusion_patterns) if exclusion_patterns else []
return set(included) - set(excluded)
def multi_filter(names, patterns):
"""Generator function which yields the names that match one or more of the patterns."""
for name in names:
if any(fnmatch.fnmatch(name, pattern) for pattern in patterns):
yield nameI would also consider changing the description; it's obvious from introspection that
inclusion_patterns and exclusion_patterns default to empty, which is counter-intuitive. I would change it todef super_filter(names, inclusion_patterns=('*',), exclusion_patterns=()):
"""
Enhanced version of fnmatch.filter() that accepts multiple inclusion and exclusion patterns.
Filter the input names by choosing only those that are matched by
some pattern in inclusion_patterns _and_ not by any in exclusion_patterns.
"""
included = multi_filter(names, inclusion_patterns)
excluded = multi_filter(names, exclusion_patterns)
return set(included) - set(excluded)Your tests cover the general case fine but they don't check edge-cases; what happens with 0-length inputs? What about complicated patterns? Do you ever check precedence, despite having mentioned it?
Code Snippets
for pattern in patterns:
for name in fnmatch.filter(names, pattern):
yield namefor name in names:
if any(fnmatch.fnmatch(name, pattern) for pattern in patterns):
yield namereturn set(included).difference(excluded)import fnmatch
def super_filter(names, inclusion_patterns=[], exclusion_patterns=[]):
"""
Enhanced version of fnmatch.filter() that accepts multiple inclusion and exclusion patterns.
If only inclusion_patterns is specified, only the names which match one or more patterns are returned.
If only exclusion_patterns is specified, only the names which do not match any pattern are returned.
If both are specified, the exclusion patterns take precedence.
If neither is specified, the input is returned as-is.
"""
included = multi_filter(names, inclusion_patterns) if inclusion_patterns else names
excluded = multi_filter(names, exclusion_patterns) if exclusion_patterns else []
return set(included) - set(excluded)
def multi_filter(names, patterns):
"""Generator function which yields the names that match one or more of the patterns."""
for name in names:
if any(fnmatch.fnmatch(name, pattern) for pattern in patterns):
yield namedef super_filter(names, inclusion_patterns=('*',), exclusion_patterns=()):
"""
Enhanced version of fnmatch.filter() that accepts multiple inclusion and exclusion patterns.
Filter the input names by choosing only those that are matched by
some pattern in inclusion_patterns _and_ not by any in exclusion_patterns.
"""
included = multi_filter(names, inclusion_patterns)
excluded = multi_filter(names, exclusion_patterns)
return set(included) - set(excluded)Context
StackExchange Code Review Q#74713, answer score: 4
Revisions (0)
No revisions yet.