patternpythonMinor
Tree Utility in Python
Viewed 0 times
utilitypythontree
Problem
I wrote this piece of code to mimic the *nix tree utility, however, I am not happy with the
pad_info=[] part. It's used to make padding according to whether the parents were last child of their parents or not. Is there any alternate (and a more elegant) way to do this?def tree(path, depth=1, max_depth=100, print_hidden=False, pad_info=[]):
'''Print contents of directories in a tree-like format
By default, it prints upto a depth of 100 and doesn't print hidden files,
ie, files whose name begin with a '.'
returns number of files, number of directories it encountered
'''
fileCount, dirCount = 0, 0
files = sorted(os.listdir(path), key=lambda s: s.lower())
if not print_hidden:
files = [os.path.join(path, x) for x in files if not x.startswith('.')]
for i, file in enumerate(files):
padding = ['| ' if x == 'nl' else ' ' for x in pad_info]
padding = ''.join(padding)
filename = os.path.basename(file)
is_last = i == len(files) - 1
prefix = '`-- ' if is_last else '|-- '
print '%s%s%s' % (padding, prefix, filename)
if os.path.isdir(file):
dirCount += 1
new_pad_info = pad_info + (['l'] if is_last else ['nl'])
fc, dc = tree(os.path.join(path, file), depth=depth+1,
max_depth=max_depth, print_hidden=print_hidden,
pad_info=new_pad_info)
fileCount += fc
dirCount += dc
else:
fileCount += 1
return fileCount, dirCountSolution
Some minor style and code comments:
-
Use internal function to hide inner implementation details - It could be argued that this is a good candidate for using a function within the function. This internal function could be the one actually handling the recursion, and only having the parameters
In other words, you could hide some of the inner workings of your function using a inner function, and let the outer function be a simple call to the inner function doing the work for you. Notice also that the inner function has full access to the outer functions variables, so need to pass them around (or treat as globals).
-
BUG: Your code doesnt' respect the
My refactored code
Here is the code when applying most of this comment to it:
``
def tree(path, do_output=True, print_hidden=False, max_depth=100):
"""Print file and directory tree starting at path.
By default, it prints upto a depth of 100 and doesn't print hidden files,
ie. files whose name begin with a '.'. It can be modified to only return
the count of files and directories, and not print anything.
Returns the tuple of number of files and number of directories
"""
def _tree(path, depth):
file_count, directory_count = 0, 0
files = sorted((os.path.join(path, filename)
for filename in os.listdir(path)
if print_hidden or not filename.startswith('.')),
key=lambda s: s.lower())
files_count = len(files)
for i, filepath in enumerate(files, start = 1):
# Print current file, based on previously gathered info
if do_output:
print('{}{}{}'.format(
''.join(FOLDER_PATTERN[folder] for folder in parent_folders),
FILE_PATTERN[i == files_count],
os.path.basename(filepath)))
# Recurse if we find a new subdirectory
if os.path.isdir(filepath) and depth < max_depth:
# Append whether current directory is last in current list or not
parent_folders.append(i == files_count)
# Print subdirectory and get numbers
subdir_file_count, subdir_directory_count = \
_tree(os.path.join(filepath), depth+1)
# Back in current directory, remove the newly added directory
parent_folders.pop()
# Update counters
file_count += subdir_file_count
directory_count += subdir_directory_count + 1
elif os.path.isdir(filepath):
directory_count += 1
else:
file_count += 1
return file_count, directory_c
- Be consistent in using
snake_casefor variables names – Most names are good, but then afileCountanddirCountsneaks in. Also try avoid using abbreviations likefcanddc
- Avoid building lists twice – You first build the list of
filesfromos.listdir(), and then rebuild it usingfilesin combination withprint_hidden. This can be avoided using a list comprehension with an if statement like[a for a in a_list if a == something]
- Avoid one-time intermediate variables – If you use a variable just once, it can be argued that it is better to just use the expression directly. In your code this applies to
padding,filenameandprefix
- Avoid hiding predefined variables and functions – Don't use reservered words like
filein your code, which hides the originalfile.
- When supplying all variables, you don't need to prefix them by name – In your recursive call to
treeyou prefix them, which is not needed as the order and presences is already stated.
- Instead of storing text, store a boolean in your
pad_info– To avoid string comparisons and storage, it makes more sense to store a boolean value whether the director is the last in the current directory or not. This can further simplify your code by using a preset list to choose which text to be displayed directly asTrue = 1andFalse = 0when used in a list context.
- Maybe a personal preferences, but use
"""for docstrings – If you are consistently using"""for docstrings, this allows for intermediate'in the text, as well as enabling you to comment out larger portion of code securely using the other option of'''.
- Add option to remove output – As your code returns the file and directory counts, it could be viable to foresee a situation where you are not interested in actually viewing the output. This could be added as another option to your function.
-
Use internal function to hide inner implementation details - It could be argued that this is a good candidate for using a function within the function. This internal function could be the one actually handling the recursion, and only having the parameters
path and depth.In other words, you could hide some of the inner workings of your function using a inner function, and let the outer function be a simple call to the inner function doing the work for you. Notice also that the inner function has full access to the outer functions variables, so need to pass them around (or treat as globals).
-
BUG: Your code doesnt' respect the
max_depth variable – You don't have anything bailing out if the depth is too high...- Feature: No marking of empty directories – If reducing
max_depthor having empty directories there is no marking to indicate that the current entry is actually a directory.
My refactored code
Here is the code when applying most of this comment to it:
``
import os
FOLDER_PATTERN = ['| ', ' ']
FILE_PATTERN = ['|-- ', '-- ']def tree(path, do_output=True, print_hidden=False, max_depth=100):
"""Print file and directory tree starting at path.
By default, it prints upto a depth of 100 and doesn't print hidden files,
ie. files whose name begin with a '.'. It can be modified to only return
the count of files and directories, and not print anything.
Returns the tuple of number of files and number of directories
"""
def _tree(path, depth):
file_count, directory_count = 0, 0
files = sorted((os.path.join(path, filename)
for filename in os.listdir(path)
if print_hidden or not filename.startswith('.')),
key=lambda s: s.lower())
files_count = len(files)
for i, filepath in enumerate(files, start = 1):
# Print current file, based on previously gathered info
if do_output:
print('{}{}{}'.format(
''.join(FOLDER_PATTERN[folder] for folder in parent_folders),
FILE_PATTERN[i == files_count],
os.path.basename(filepath)))
# Recurse if we find a new subdirectory
if os.path.isdir(filepath) and depth < max_depth:
# Append whether current directory is last in current list or not
parent_folders.append(i == files_count)
# Print subdirectory and get numbers
subdir_file_count, subdir_directory_count = \
_tree(os.path.join(filepath), depth+1)
# Back in current directory, remove the newly added directory
parent_folders.pop()
# Update counters
file_count += subdir_file_count
directory_count += subdir_directory_count + 1
elif os.path.isdir(filepath):
directory_count += 1
else:
file_count += 1
return file_count, directory_c
Code Snippets
import os
FOLDER_PATTERN = ['| ', ' ']
FILE_PATTERN = ['|-- ', '`-- ']
def tree(path, do_output=True, print_hidden=False, max_depth=100):
"""Print file and directory tree starting at path.
By default, it prints upto a depth of 100 and doesn't print hidden files,
ie. files whose name begin with a '.'. It can be modified to only return
the count of files and directories, and not print anything.
Returns the tuple of number of files and number of directories
"""
def _tree(path, depth):
file_count, directory_count = 0, 0
files = sorted((os.path.join(path, filename)
for filename in os.listdir(path)
if print_hidden or not filename.startswith('.')),
key=lambda s: s.lower())
files_count = len(files)
for i, filepath in enumerate(files, start = 1):
# Print current file, based on previously gathered info
if do_output:
print('{}{}{}'.format(
''.join(FOLDER_PATTERN[folder] for folder in parent_folders),
FILE_PATTERN[i == files_count],
os.path.basename(filepath)))
# Recurse if we find a new subdirectory
if os.path.isdir(filepath) and depth < max_depth:
# Append whether current directory is last in current list or not
parent_folders.append(i == files_count)
# Print subdirectory and get numbers
subdir_file_count, subdir_directory_count = \
_tree(os.path.join(filepath), depth+1)
# Back in current directory, remove the newly added directory
parent_folders.pop()
# Update counters
file_count += subdir_file_count
directory_count += subdir_directory_count + 1
elif os.path.isdir(filepath):
directory_count += 1
else:
file_count += 1
return file_count, directory_count
parent_folders = []
return _tree(path, 1)Context
StackExchange Code Review Q#62123, answer score: 4
Revisions (0)
No revisions yet.