patternpythonMinor
Self-taught Pythonista: Any criticism welcome for this concurrent word count script!
Viewed 0 times
thisscriptwelcomepythonistaanywordcriticismforcountconcurrent
Problem
I've been teaching myself Python - my first programming language - for about two years now.
I recently discovered the
```
import re
import shutil
import string
from collections import Counter
from concurrent.futures import ProcessPoolExecutor
from itertools import chain, islice, zip_longest
from urllib.request import urlopen
# Regex to use for splitting text into words, dropping everything but
# alphabetic characters.
REGEX = re.compile(r"[{}{}{}]+".format(
string.whitespace, string.digits, string.punctuation))
# http://docs.python.org/3/library/itertools.html#itertools-recipes
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
def split_and_clean(line):
"""Returns an iterator of the words in line.
Example:
>>> list(split_and_clean("3, Four, five'"))
['four', 'five']
Args:
line: A string.
Returns:
A filter of the alphabetic words in the line.
Raises:
TypeError: Your input was of type <>. Must be a string.
"""
try:
return filter(None, re.split(REGEX, line.lower()))
except AttributeError:
input_type_str = str(type(line))[8:-2]
error_message = "Your input was of type {}. Must be a string.".format(
input_type_str)
raise TypeError(error_message)
def wc_some_lines(lines):
"""Return a Counter containing the word count of several lines.
Excludes any digital numbers or punctuation.
Example:
>>> wc_some_lines(["Line 1.", "Another line."])
Counter({'line': 2, 'another': 1})
Args:
lines: An iterable of strings.
Returns:
A collections.Counter mapping words to their word counts.
Raises:
TypeError
I recently discovered the
concurrent.futures module and wanted to do something with it. What do you think about this script?```
import re
import shutil
import string
from collections import Counter
from concurrent.futures import ProcessPoolExecutor
from itertools import chain, islice, zip_longest
from urllib.request import urlopen
# Regex to use for splitting text into words, dropping everything but
# alphabetic characters.
REGEX = re.compile(r"[{}{}{}]+".format(
string.whitespace, string.digits, string.punctuation))
# http://docs.python.org/3/library/itertools.html#itertools-recipes
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
def split_and_clean(line):
"""Returns an iterator of the words in line.
Example:
>>> list(split_and_clean("3, Four, five'"))
['four', 'five']
Args:
line: A string.
Returns:
A filter of the alphabetic words in the line.
Raises:
TypeError: Your input was of type <>. Must be a string.
"""
try:
return filter(None, re.split(REGEX, line.lower()))
except AttributeError:
input_type_str = str(type(line))[8:-2]
error_message = "Your input was of type {}. Must be a string.".format(
input_type_str)
raise TypeError(error_message)
def wc_some_lines(lines):
"""Return a Counter containing the word count of several lines.
Excludes any digital numbers or punctuation.
Example:
>>> wc_some_lines(["Line 1.", "Another line."])
Counter({'line': 2, 'another': 1})
Args:
lines: An iterable of strings.
Returns:
A collections.Counter mapping words to their word counts.
Raises:
TypeError
Solution
PEP8 mentions that top-level constructs like functions should be separated by two lines. Hanging indents should have only one level of indentation (lines 43 and 61). Be careful about trailing whitespaces (lines 61 and 89).
I love functional style myself but it is often frowned upon in Python, and
Otherwise your code is awesome, well-written with beautiful docstrings and clever (
I love functional style myself but it is often frowned upon in Python, and
Counter(chain(*map(split_and_clean, lines))) or filter(None, re.split(REGEX, line.lower())) will be considered unreadable by some, and elegant by others.Otherwise your code is awesome, well-written with beautiful docstrings and clever (
filter() to drop empty strings, AttributeError and to the call to lower()). Thanks for sharing!Context
StackExchange Code Review Q#27739, answer score: 2
Revisions (0)
No revisions yet.