patternpythonMinor
Finding words that rhyme
Viewed 0 times
wordsrhymethatfinding
Problem
Preface
I was trying to review this question on the same topic, but in the end many points I wanted to make were excellently explained by @ferada so I felt that posting my code and explaining the concepts around such changes would just be repetition.
Instead I am questioning whether my approach is over-engineered considering the simplicity of this task.
Specification
I consider two words to rhyme if they share a certain number of phonemes at the end.
The division of the words into phonemes is already given from input and must only be read into program data.
Data is fetched from the internet on the first run and stored in a local file for successive runs to save internet traffic and execution time.
Code
``
>>> format_data(["ZURCHER Z ER1 K ER0", "ZUREK Z UH1 R EH0 K"])
[ReadingData(word='ZURCHER', phonemes=['Z', 'ER1', 'K', 'ER0']), ReadingData(word='ZUREK', phonemes=['Z', 'UH1', 'R', 'EH0', 'K'])]
'''
return [ReadingData(*first_and_others(x.strip().replace('\n','').split()))\
for x in raw_data \
if not x.startswith(';;;')]
def write_url_to_filename(url, filename):
response = requests.get(URL, stream=True)
if not response.ok:
raise Exception("Error writing {} to {}".format(url, filename))
with open(filename, 'wb+') as f:
for block in response.iter_content(4096):
f.write(bloc
I was trying to review this question on the same topic, but in the end many points I wanted to make were excellently explained by @ferada so I felt that posting my code and explaining the concepts around such changes would just be repetition.
Instead I am questioning whether my approach is over-engineered considering the simplicity of this task.
Specification
I consider two words to rhyme if they share a certain number of phonemes at the end.
The division of the words into phonemes is already given from input and must only be read into program data.
Data is fetched from the internet on the first run and stored in a local file for successive runs to save internet traffic and execution time.
Code
``
import doctest
import os
import re
import requests
from collections import namedtuple
URL = "http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b"
LIBRARY = "library.txt"
PHONEMES_TO_MATCH = 3
ReadingData = namedtuple("ReadingData", ["word", "phonemes"])
def first_and_others(xs):
"""
>>> first_and_others([1, 2, 3])
(1, [2, 3])
"""
return xs[0], xs[1:]
def format_data(raw_data):
'''
Transforms a list of lines of the format
WORD [SPACE] PHONEME1 [SPACE] PHONEME2 [SPACE] PHONEME3 ...
into a list of ReadingData` classes for easier further processing.>>> format_data(["ZURCHER Z ER1 K ER0", "ZUREK Z UH1 R EH0 K"])
[ReadingData(word='ZURCHER', phonemes=['Z', 'ER1', 'K', 'ER0']), ReadingData(word='ZUREK', phonemes=['Z', 'UH1', 'R', 'EH0', 'K'])]
'''
return [ReadingData(*first_and_others(x.strip().replace('\n','').split()))\
for x in raw_data \
if not x.startswith(';;;')]
def write_url_to_filename(url, filename):
response = requests.get(URL, stream=True)
if not response.ok:
raise Exception("Error writing {} to {}".format(url, filename))
with open(filename, 'wb+') as f:
for block in response.iter_content(4096):
f.write(bloc
Solution
Your use of constants is a bit buggy. First of, since you call
into
i.e: use the parameter instead of the constant.
Secondly, since you only use your constants in
I would also have named
On a stylistic note, you make extensive use of
You also seem to indent them one extra space after each line, try to avoid that.
Not that your approach is bad per se, but you might want to compare the two, performance wise:
When building your
I would also change your helper function into:
so that:
-
you can simplify
-
you let the possibility to the writer of the question that led to this code to write, in Python 3:
which is even more explicit
-
You can use it directly in your
Alternatively, you could just pass the file
write_url_to_filename(URL, LIBRARY), you should changedef write_url_to_filename(url, filename):
response = requests.get(URL, stream=True)into
def write_url_to_filename(url, filename):
response = requests.get(url, stream=True)i.e: use the parameter instead of the constant.
Secondly, since you only use your constants in
main and pass them as parameters afterwards, I feel it would be best to have them as parameters with default value instead of constants:def main(library='library.txt',
url='http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b',
phonemes=3):
if not os.path.isfile(library):
write_url_to_filename(url, library)
with open(library) as f:
data = format_data(f.readlines())
print(find_rhymes("FORCE", data, phonemes)[:20])I would also have named
write_url_to_filename simply download_into but it's not guaranteed that it is a better name.On a stylistic note, you make extensive use of
\ in your list-comprehensions where implicit line continuation are at play. You can drop them, it won't change the meaning.You also seem to indent them one extra space after each line, try to avoid that.
find is just applying a filter and taking the first element. Since you are using Python 2, you may be insterested in itertools.ifilter instead of filter. The former being an iterator whereas the latter returns a list.Not that your approach is bad per se, but you might want to compare the two, performance wise:
from itertools import ifilter
def find(predicate, iterable):
return next(ifilter(predicate, iterable))When building your
ReadingData, your x.strip().replace('\n','').split()) can be replaced by x.split() as split without arguments has extra logic to clean-up a string while performing the split.I would also change your helper function into:
def clean_line(line):
data = line.split()
return ReadingData(data[0], data[1:])so that:
-
you can simplify
format_data:def format_data(raw_data):
return [clean_line(x) for x in raw_data if not x.startswith(';;;')]-
you let the possibility to the writer of the question that led to this code to write, in Python 3:
def clean_line(line):
word, *phonemes = line.split()
return ReadingData(word, phonemes)which is even more explicit
-
You can use it directly in your
with block without having to read the whole file into memory first:def main(library='library.txt',
url='http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b',
phonemes=3):
if not os.path.isfile(library):
write_url_to_filename(url, library)
with open(library) as f:
data = [clean_line(line) for line in f if not line.startswith(';;;')]
print(find_rhymes("FORCE", data, phonemes)[:20])Alternatively, you could just pass the file
f to format_data instead of the list of lines. It will work without modifications and will also let you build the data as you read the file instead of reading the file first and then building data.Code Snippets
def write_url_to_filename(url, filename):
response = requests.get(URL, stream=True)def write_url_to_filename(url, filename):
response = requests.get(url, stream=True)def main(library='library.txt',
url='http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b',
phonemes=3):
if not os.path.isfile(library):
write_url_to_filename(url, library)
with open(library) as f:
data = format_data(f.readlines())
print(find_rhymes("FORCE", data, phonemes)[:20])from itertools import ifilter
def find(predicate, iterable):
return next(ifilter(predicate, iterable))def clean_line(line):
data = line.split()
return ReadingData(data[0], data[1:])Context
StackExchange Code Review Q#140066, answer score: 5
Revisions (0)
No revisions yet.