patternpythonMinor
Using generators to print the tree-like structure of a project
Viewed 0 times
thelikeprojectprintusingstructuregeneratorstree
Problem
The goal of this code is to print the entity tree of a VHDL project. There are a readme and very minimal tests on the github repo.
I am trying to refactor the code to use generators in order to familiarize myself with
`import re
from sys import argv
from os import walk
from os.path import join as pjoin
EXCLUDES = ["implementation", "testbench"]
BASIC_ID_REGEX = "[a-z][a-z0-9](?:_[a-z0-9]+)"
def _vhdltree(level, filepath, pattern, vhd_files):
for entity, component in find_entities(filepath, pattern):
"""Codereview: I am not specifically interested in feedbacks for the
following snippet (except about the recursive design), but if you
have a particularly elegant
solution which keeps the "UI" part very minimal, suggestions
are welcome"""
path = vhd_files.get(component.lower(), "Not Found")
print(" "*level + entity + " : " + path)
if path != "Not Found":#Probably ugly, but lazy
_vhdltree(level+1, path, pattern, vhd_files)
def find_entities(filepath, pattern):
with open(filepath) as f:
for l in f:
m = pattern.match(l)
if m:
yield m.group('entity'), m.group('component').split(".")[-1]
def find_vhd(proot):
for (dirpath, dirnames, filenames) in walk(proot):
if not isexcluded(dirpath.lower()):
for fn in filenames:
if fn[-4:].lower() == ".vhd":
yield fn[:-4].lower(), pjoin(dirpath, fn)
def isexcluded(path):
for excluder in EXCLUDES:
if excluder in path:
return True
return False
def vhdltree(filepath, proot):
instantiation_regex = ("\s(?P{0})\s:\sentity\s(?P{0}(?:\.{0})*)" # NOQA
.format(BASIC_ID_REGEX))
p = re.compile(instantiation_regex, re.IGNORECASE)
vhd_files = dict(find_vhd(proot))
_vhdl
I am trying to refactor the code to use generators in order to familiarize myself with
yield, and to think in terms of iterators, not raw data structures. Here is my current attempt at doing so:`import re
from sys import argv
from os import walk
from os.path import join as pjoin
EXCLUDES = ["implementation", "testbench"]
BASIC_ID_REGEX = "[a-z][a-z0-9](?:_[a-z0-9]+)"
def _vhdltree(level, filepath, pattern, vhd_files):
for entity, component in find_entities(filepath, pattern):
"""Codereview: I am not specifically interested in feedbacks for the
following snippet (except about the recursive design), but if you
have a particularly elegant
solution which keeps the "UI" part very minimal, suggestions
are welcome"""
path = vhd_files.get(component.lower(), "Not Found")
print(" "*level + entity + " : " + path)
if path != "Not Found":#Probably ugly, but lazy
_vhdltree(level+1, path, pattern, vhd_files)
def find_entities(filepath, pattern):
with open(filepath) as f:
for l in f:
m = pattern.match(l)
if m:
yield m.group('entity'), m.group('component').split(".")[-1]
def find_vhd(proot):
for (dirpath, dirnames, filenames) in walk(proot):
if not isexcluded(dirpath.lower()):
for fn in filenames:
if fn[-4:].lower() == ".vhd":
yield fn[:-4].lower(), pjoin(dirpath, fn)
def isexcluded(path):
for excluder in EXCLUDES:
if excluder in path:
return True
return False
def vhdltree(filepath, proot):
instantiation_regex = ("\s(?P{0})\s:\sentity\s(?P{0}(?:\.{0})*)" # NOQA
.format(BASIC_ID_REGEX))
p = re.compile(instantiation_regex, re.IGNORECASE)
vhd_files = dict(find_vhd(proot))
_vhdl
Solution
Generators
Since you were asking about generators specifically, let’s start with some suggestions on how you’re using them:
-
Your
I often write my generators to take an iterable of lines (which could be an open file object, or something else) – this makes the code easier to unit test, and more generic. I can read the lines from an arbitrary iterator, not just a file. It’s a small change:
-
Your
Something like:
(I’ll talk about the KeyError in a minute.)
-
I really like your
One tweak I would make is to throw away the
General Python style
Generators aside, here are some comments on your general Python style:
-
I have a bit of a hobby horse about use of regular expressions. I think you’re using them for a sensible reason, but you could be better about commenting, and explaining what they’re trying to match.
Breaking it over multiple lines, and explaining exactly what you’re trying to match against, will make it much easier to read. Additionally, when you come back to this code in six months, you’ll thank your past self for those comments.
I would also suggest pre-compiling the
-
Your
This is using a lazy generator expression as the argument, and the nice thing about
I see that you’re already using an equivalent expression with
-
I’m not a super fan of
Characters are cheap – use the full, stdlib name. It will make your code much easier to read for other Python developers, who don’t need to worry about what
-
Rather than using a sentinel value from
And I think using a dictionary here is fine. It’s simple, clean, and it works.
Since you were asking about generators specifically, let’s start with some suggestions on how you’re using them:
-
Your
find_entities() function opens a file, then iterates over the lines in that file. That’s a good use of Python’s ability to iterate directly over the lines in a file, but it’s a bit fiddly to unit test because you have to create files.I often write my generators to take an iterable of lines (which could be an open file object, or something else) – this makes the code easier to unit test, and more generic. I can read the lines from an arbitrary iterator, not just a file. It’s a small change:
def find_entities(lines, pattern):
for line in lines:
m = pattern.match(line)
if m:
yield m.group('entity'), m.group('component').split('.')[-1]-
Your
_vhdltree() function is really doing two things: it’s working out the level, entity and path for an element in the iterator, and then printing it. Problem with printing is that it becomes hard to reuse this computation elsewhere. Better if you split the printing out from this function, and defer that to the caller.Something like:
def parse_vhdltree(level, filepath, pattern, vhd_files):
for entity, component in find_entities(open(file path), vhd_files):
try:
path = vhd_files[component.lower()]
except KeyError:
continue
yield level, entity, path
for entry in parse_vhdltree(level+1, path, pattern, vhd_files):
yield entry
...
for level, entity, path in parse_vhdltree(0, filepath, p, vhd_files):
print(' ' * level + entity + ' : ' + path)(I’ll talk about the KeyError in a minute.)
-
I really like your
find_vhd() generator, and it’s a good example of how generators can clean up code in Python. All the messy logic of deciding which files to count is absorbed in this function.One tweak I would make is to throw away the
dirnames data – you’re not using it. Using an underscore in its place is a common convention to indicate you’re not interested in a particular value:def find_vhd(proot):
for (dirpath, _, filenames) in walk(proot):
if not isexcluded(dirpath.lower()):
for fn in filenames:
if fn[-4:].lower() == ".vhd":
yield fn[:-4].lower(), pjoin(dirpath, fn)General Python style
Generators aside, here are some comments on your general Python style:
-
I have a bit of a hobby horse about use of regular expressions. I think you’re using them for a sensible reason, but you could be better about commenting, and explaining what they’re trying to match.
Breaking it over multiple lines, and explaining exactly what you’re trying to match against, will make it much easier to read. Additionally, when you come back to this code in six months, you’ll thank your past self for those comments.
I would also suggest pre-compiling the
instantiation_regex at the top of the file, alongside BASIC_ID_REGEX.-
Your
isexcluded() function could be replaced with a one-liner: return any(excluder in path for excluder in EXCLUDES)This is using a lazy generator expression as the argument, and the nice thing about
any() is that it exits as soon as it gets a positive element.I see that you’re already using an equivalent expression with
all() in the GitHub version, which is similarly lazy. It exits as soon as it finds a negative element. -
I’m not a super fan of
from foo import bar style imports, especially from the standard library. Even worse if from foo import bar as baz, as you’ve done with os.path.join.Characters are cheap – use the full, stdlib name. It will make your code much easier to read for other Python developers, who don’t need to worry about what
pjoin() does. They can just recognise os.path.join. -
Rather than using a sentinel value from
dict.get() to denote a missing key in the dictionary, do a lookup with square brackets and catch the KeyError. This is more Pythonic, and avoids problems if the sentinel value turns up in the dictionary.And I think using a dictionary here is fine. It’s simple, clean, and it works.
Code Snippets
def find_entities(lines, pattern):
for line in lines:
m = pattern.match(line)
if m:
yield m.group('entity'), m.group('component').split('.')[-1]def parse_vhdltree(level, filepath, pattern, vhd_files):
for entity, component in find_entities(open(file path), vhd_files):
try:
path = vhd_files[component.lower()]
except KeyError:
continue
yield level, entity, path
for entry in parse_vhdltree(level+1, path, pattern, vhd_files):
yield entry
...
for level, entity, path in parse_vhdltree(0, filepath, p, vhd_files):
print(' ' * level + entity + ' : ' + path)def find_vhd(proot):
for (dirpath, _, filenames) in walk(proot):
if not isexcluded(dirpath.lower()):
for fn in filenames:
if fn[-4:].lower() == ".vhd":
yield fn[:-4].lower(), pjoin(dirpath, fn)return any(excluder in path for excluder in EXCLUDES)Context
StackExchange Code Review Q#144105, answer score: 2
Revisions (0)
No revisions yet.