HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Check if a file path matches any of the patterns in a blacklist

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
matchespathfiletheanypatternsblacklistcheck

Problem

I'm processing a list of files, and want to check the filenames against a list of regex, like:

IGNORE_FILES = [
    re.compile('^./Vendor.*'),
    re.compile('^./Pods.*'), 
    …
]

def in_blacklist(file):
    return len(list(filter(lambda r: r.match(file) != None, IGNORE_FILES))) > 0

def in_whitelist(file):
    return SWIFT_FILE_REGEX.match(file) != None

def files():
    valid_files = []
    for root, dirs, files in os.walk('.'):
        for file in files:
            if in_whitelist(file) and not in_blacklist(root):
                valid_files.append("%s/%s" % (root, file))
    return valid_files


I'm looking for a smoother way to write:

len(list(filter(lambda r: r.match(root) != None, IGNORE_FILES))) == 0


I feel that it's not so easy to read, and especially with the list(..)

Solution

You can build one regex out of the list of regexes by

'|'.join(IGNORE_FILES)


(Well, before running re.compile on them...)

Then your in_blacklist function becomes as easy to write and read as in_whitelist.

I further assume that you don't want to ignore paths that start with any letter, but those starting with a literal dot (which is what os.walk('.') yields). So you need to escape those leading dots.

Also, in your files() function, you should move your in_blacklist check out of the inner loop. No need to check the same root directory multiple times.

So:

IGNORE_FILES = re.compile('|'.join([
    r'^\./Vendor.*',
    r'^\./Pods.*', 
]))

def in_blacklist(file):
    return IGNORE_FILES.match(file) != None

def in_whitelist(file):
    return SWIFT_FILE_REGEX.match(file) != None

def files():
    valid_files = []
    for root, dirs, files in os.walk('.'):
        if in_blacklist(root):
            continue
        for file in files:
            if in_whitelist(file):
                valid_files.append("%s/%s" % (root, file))
    return valid_files


And for more portability, you could replace "%s/%s" % (root, file) by os.path.join(root, file), but I wouldn't insist on this too hard...

Code Snippets

'|'.join(IGNORE_FILES)
IGNORE_FILES = re.compile('|'.join([
    r'^\./Vendor.*',
    r'^\./Pods.*', 
]))

def in_blacklist(file):
    return IGNORE_FILES.match(file) != None

def in_whitelist(file):
    return SWIFT_FILE_REGEX.match(file) != None

def files():
    valid_files = []
    for root, dirs, files in os.walk('.'):
        if in_blacklist(root):
            continue
        for file in files:
            if in_whitelist(file):
                valid_files.append("%s/%s" % (root, file))
    return valid_files

Context

StackExchange Code Review Q#144733, answer score: 5

Revisions (0)

No revisions yet.