HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Filtering with multiple inclusion and exclusion patterns

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
inclusionexclusionwithpatternsmultipleandfiltering

Problem

I have a requirement to be able to filter a list of strings by both inclusion and exclusion patterns (using fnmatch-style wildcards), of which there can be many.

For example, given the set of values:

['a1', 'a2', 'a3', 'b1', 'b2', 'b3', 'c1', 'c2', 'c3']


And these inclusion filters:

['a*', 'b*']


It should return:

['a1', 'a2', 'a3', 'b1', 'b2', 'b3']


It should also support using exclusion filters, e.g. ['c*'] to get the same result.

Lastly, using both inclusion and exclusion filters, the exclusion filters should take precedence in case of a conflict.

Here is my code and tests:

-
extrafilters.py

import fnmatch

def superFilter(names, inclusion_patterns=[], exclusion_patterns=[]):
    """Enhanced version of fnmatch.filter() that accepts multiple inclusion and exclusion patterns.
    If only inclusion_patterns is specified, only the names which match one or more patterns are returned.
    If only exclusion_patterns is specified, only the names which do not match any pattern are returned.
    If both are specified, the exclusion patterns take precedence.
    If neither is specified, the input is returned as-is."""
    included = multiFilter(names, inclusion_patterns) if inclusion_patterns else names
    excluded = multiFilter(names, exclusion_patterns) if exclusion_patterns else []
    return set(included) - set(excluded)

def multiFilter(names, patterns):
    """Generator function which yields the names that match one or more of the patterns."""
    for name in names:
        for pattern in patterns:
            if fnmatch.fnmatch(name, pattern):
                yield name


-
extrafilters_test.py

```
import unittest
from extrafilters import superFilter, multiFilter

class multiFilterTests(unittest.TestCase):
def setUp(self):
self.names = ['a1', 'a2', 'a3', 'b1', 'b2', 'b3', 'c1', 'c2', 'c3']

def test_patterns(self):
patterns = ['a', 'b']
expected = ['a1', 'a2', 'a3', 'b1', 'b2', 'b3

Solution

You can use fnmatch.filter to simplify multiFilter:

for pattern in patterns:
    for name in fnmatch.filter(names, pattern):
        yield name


This still allows an element to be yielded several times, so instead you might want to do:

for name in names:
    if any(fnmatch.fnmatch(name, pattern) for pattern in patterns):
        yield name


You can also avoid creating a second set by using .difference:

return set(included).difference(excluded)


I would probably stick with the original, though. I only mention it because a fair number of people don't know about it and sometimes it does matter.

Your naming breaks PEP8. Don't let unittest fool you; unittest was written by heathens before the style guide was standardized.

A quick touch-up is

import fnmatch

def super_filter(names, inclusion_patterns=[], exclusion_patterns=[]):
    """
    Enhanced version of fnmatch.filter() that accepts multiple inclusion and exclusion patterns.

    If only inclusion_patterns is specified, only the names which match one or more patterns are returned.
    If only exclusion_patterns is specified, only the names which do not match any pattern are returned.
    If both are specified, the exclusion patterns take precedence.
    If neither is specified, the input is returned as-is.
    """
    included = multi_filter(names, inclusion_patterns) if inclusion_patterns else names
    excluded = multi_filter(names, exclusion_patterns) if exclusion_patterns else []
    return set(included) - set(excluded)

def multi_filter(names, patterns):
    """Generator function which yields the names that match one or more of the patterns."""
    for name in names:
        if any(fnmatch.fnmatch(name, pattern) for pattern in patterns):
            yield name


I would also consider changing the description; it's obvious from introspection that inclusion_patterns and exclusion_patterns default to empty, which is counter-intuitive. I would change it to

def super_filter(names, inclusion_patterns=('*',), exclusion_patterns=()):
    """
    Enhanced version of fnmatch.filter() that accepts multiple inclusion and exclusion patterns.

    Filter the input names by choosing only those that are matched by
    some pattern in inclusion_patterns _and_ not by any in exclusion_patterns.
    """
    included = multi_filter(names, inclusion_patterns)
    excluded = multi_filter(names, exclusion_patterns)
    return set(included) - set(excluded)


Your tests cover the general case fine but they don't check edge-cases; what happens with 0-length inputs? What about complicated patterns? Do you ever check precedence, despite having mentioned it?

Code Snippets

for pattern in patterns:
    for name in fnmatch.filter(names, pattern):
        yield name
for name in names:
    if any(fnmatch.fnmatch(name, pattern) for pattern in patterns):
        yield name
return set(included).difference(excluded)
import fnmatch

def super_filter(names, inclusion_patterns=[], exclusion_patterns=[]):
    """
    Enhanced version of fnmatch.filter() that accepts multiple inclusion and exclusion patterns.

    If only inclusion_patterns is specified, only the names which match one or more patterns are returned.
    If only exclusion_patterns is specified, only the names which do not match any pattern are returned.
    If both are specified, the exclusion patterns take precedence.
    If neither is specified, the input is returned as-is.
    """
    included = multi_filter(names, inclusion_patterns) if inclusion_patterns else names
    excluded = multi_filter(names, exclusion_patterns) if exclusion_patterns else []
    return set(included) - set(excluded)

def multi_filter(names, patterns):
    """Generator function which yields the names that match one or more of the patterns."""
    for name in names:
        if any(fnmatch.fnmatch(name, pattern) for pattern in patterns):
            yield name
def super_filter(names, inclusion_patterns=('*',), exclusion_patterns=()):
    """
    Enhanced version of fnmatch.filter() that accepts multiple inclusion and exclusion patterns.

    Filter the input names by choosing only those that are matched by
    some pattern in inclusion_patterns _and_ not by any in exclusion_patterns.
    """
    included = multi_filter(names, inclusion_patterns)
    excluded = multi_filter(names, exclusion_patterns)
    return set(included) - set(excluded)

Context

StackExchange Code Review Q#74713, answer score: 4

Revisions (0)

No revisions yet.