HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Alternate method of iterating than os.walk

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
iteratingmethodthanwalkalternate

Problem

Recently, I became tired of how slowly os.walk seems to run (really, it should be called os.crawl), and made something recursive which moves much faster. I was using it to take directories I wanted and add them to an array, but this time I need to add all files to an array that aren't in an 'exclude array.' Does anyone have any advice?

import os

exclude_folders = ['excludefolder1', 'excludefolder2']
product_path_names = ['folder1', 'folder2', 'folder3']
# This function replaces the os.walk, it is faster and stops scanning subdirectories
# when it find a good folder name.
def findDir(path):
    for directory in os.listdir(path):
     # If it's a folder that we want to scan, go for it.
        if(os.path.isdir(os.path.join(path, directory))):
            # If folder is a product name, add it.
            if(directory.lower() in product_path_names):
                print os.path.join(path, directory)
                product_dirs.append(os.path.join(path, directory))
                with open(event_log_path, "a") as log:        
                    log.write(os.path.join(path, directory) + "\n")
                break
            # If folder is not a product name, scan it.
            else:
                #print os.path.join(path, directory)
                if(directory.lower() not in exclude_folders):
                    findDir(os.path.join(path, directory))

Solution

Style

  • According to PEP 8, functions should be named using lower_case().



  • Python code typically doesn't have parentheses around if conditions.



  • You call os.path.join() everywhere. It would be worthwhile to assign the joined path to a variable.



Error handling

If any operation fails (for example, if you do not have permission to list some directory), then the whole search aborts with an exception.

Modularity and reusability

The inclusion and exclusion lists should be parameters rather than globals.

By the Single Responsibility Principle, a function should do just one thing. The function you have written not only finds the directories that have the desired names, it also prints their path to sys.stdout, appends the results to a list, and logs the occurrence.

By making the function into a generator, you would give the caller the flexibility to do whatever it wants with the results. The caller can even have the option to terminate early after the first result. find_dirs() would then become a more generically useful function.

from __future__ import print_function
import os
import sys

def find_dirs(root_dir, names, exclude_folders=[]):
    try:
        for entry in os.listdir(root_dir):
            entry_path = os.path.join(root_dir, entry)
            entry_lowercase = entry.lower()
            if os.path.isdir(entry_path):
                if entry_lowercase in names:
                    yield entry_path
                elif entry_lowercase not in exclude_folders:
                    for result in find_dirs(entry_path, names, exclude_folders):
                        yield result
    except OSError as e:
        print(e, file=sys.stderr)

product_dirs = []
event_log_path = '/tmp/findlog.txt'
with open(event_log_path, 'a') as log:
    for lib in find_dirs('/', ['lib', 'library'], ['user']):
        print(lib)
        product_dirs.append(lib)
        log.write(lib + "\n")
        break      # Stop after finding the first match

Code Snippets

from __future__ import print_function
import os
import sys

def find_dirs(root_dir, names, exclude_folders=[]):
    try:
        for entry in os.listdir(root_dir):
            entry_path = os.path.join(root_dir, entry)
            entry_lowercase = entry.lower()
            if os.path.isdir(entry_path):
                if entry_lowercase in names:
                    yield entry_path
                elif entry_lowercase not in exclude_folders:
                    for result in find_dirs(entry_path, names, exclude_folders):
                        yield result
    except OSError as e:
        print(e, file=sys.stderr)

product_dirs = []
event_log_path = '/tmp/findlog.txt'
with open(event_log_path, 'a') as log:
    for lib in find_dirs('/', ['lib', 'library'], ['user']):
        print(lib)
        product_dirs.append(lib)
        log.write(lib + "\n")
        break      # Stop after finding the first match

Context

StackExchange Code Review Q#59151, answer score: 4

Revisions (0)

No revisions yet.