patternpythonMinor
Alternate method of iterating than os.walk
Viewed 0 times
iteratingmethodthanwalkalternate
Problem
Recently, I became tired of how slowly
os.walk seems to run (really, it should be called os.crawl), and made something recursive which moves much faster. I was using it to take directories I wanted and add them to an array, but this time I need to add all files to an array that aren't in an 'exclude array.' Does anyone have any advice?import os
exclude_folders = ['excludefolder1', 'excludefolder2']
product_path_names = ['folder1', 'folder2', 'folder3']
# This function replaces the os.walk, it is faster and stops scanning subdirectories
# when it find a good folder name.
def findDir(path):
for directory in os.listdir(path):
# If it's a folder that we want to scan, go for it.
if(os.path.isdir(os.path.join(path, directory))):
# If folder is a product name, add it.
if(directory.lower() in product_path_names):
print os.path.join(path, directory)
product_dirs.append(os.path.join(path, directory))
with open(event_log_path, "a") as log:
log.write(os.path.join(path, directory) + "\n")
break
# If folder is not a product name, scan it.
else:
#print os.path.join(path, directory)
if(directory.lower() not in exclude_folders):
findDir(os.path.join(path, directory))Solution
Style
Error handling
If any operation fails (for example, if you do not have permission to list some directory), then the whole search aborts with an exception.
Modularity and reusability
The inclusion and exclusion lists should be parameters rather than globals.
By the Single Responsibility Principle, a function should do just one thing. The function you have written not only finds the directories that have the desired names, it also prints their path to
By making the function into a generator, you would give the caller the flexibility to do whatever it wants with the results. The caller can even have the option to terminate early after the first result.
- According to PEP 8, functions should be named using
lower_case().
- Python code typically doesn't have parentheses around
ifconditions.
- You call
os.path.join()everywhere. It would be worthwhile to assign the joined path to a variable.
Error handling
If any operation fails (for example, if you do not have permission to list some directory), then the whole search aborts with an exception.
Modularity and reusability
The inclusion and exclusion lists should be parameters rather than globals.
By the Single Responsibility Principle, a function should do just one thing. The function you have written not only finds the directories that have the desired names, it also prints their path to
sys.stdout, appends the results to a list, and logs the occurrence.By making the function into a generator, you would give the caller the flexibility to do whatever it wants with the results. The caller can even have the option to terminate early after the first result.
find_dirs() would then become a more generically useful function.from __future__ import print_function
import os
import sys
def find_dirs(root_dir, names, exclude_folders=[]):
try:
for entry in os.listdir(root_dir):
entry_path = os.path.join(root_dir, entry)
entry_lowercase = entry.lower()
if os.path.isdir(entry_path):
if entry_lowercase in names:
yield entry_path
elif entry_lowercase not in exclude_folders:
for result in find_dirs(entry_path, names, exclude_folders):
yield result
except OSError as e:
print(e, file=sys.stderr)
product_dirs = []
event_log_path = '/tmp/findlog.txt'
with open(event_log_path, 'a') as log:
for lib in find_dirs('/', ['lib', 'library'], ['user']):
print(lib)
product_dirs.append(lib)
log.write(lib + "\n")
break # Stop after finding the first matchCode Snippets
from __future__ import print_function
import os
import sys
def find_dirs(root_dir, names, exclude_folders=[]):
try:
for entry in os.listdir(root_dir):
entry_path = os.path.join(root_dir, entry)
entry_lowercase = entry.lower()
if os.path.isdir(entry_path):
if entry_lowercase in names:
yield entry_path
elif entry_lowercase not in exclude_folders:
for result in find_dirs(entry_path, names, exclude_folders):
yield result
except OSError as e:
print(e, file=sys.stderr)
product_dirs = []
event_log_path = '/tmp/findlog.txt'
with open(event_log_path, 'a') as log:
for lib in find_dirs('/', ['lib', 'library'], ['user']):
print(lib)
product_dirs.append(lib)
log.write(lib + "\n")
break # Stop after finding the first matchContext
StackExchange Code Review Q#59151, answer score: 4
Revisions (0)
No revisions yet.