patternpythonMinor

Finding words that rhyme

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

wordsrhymethatfinding

Problem

Preface

I was trying to review this question on the same topic, but in the end many points I wanted to make were excellently explained by @ferada so I felt that posting my code and explaining the concepts around such changes would just be repetition.

Instead I am questioning whether my approach is over-engineered considering the simplicity of this task.

Specification

I consider two words to rhyme if they share a certain number of phonemes at the end.

The division of the words into phonemes is already given from input and must only be read into program data.

Data is fetched from the internet on the first run and stored in a local file for successive runs to save internet traffic and execution time.

Code

``


import doctest
import os
import re
import requests
from collections import namedtuple

URL = "http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b"
LIBRARY = "library.txt"
PHONEMES_TO_MATCH = 3

ReadingData = namedtuple("ReadingData", ["word", "phonemes"])

def first_and_others(xs):
    """
    >>> first_and_others([1, 2, 3])
    (1, [2, 3])
    """
    return xs[0], xs[1:]

def format_data(raw_data):
    '''
    Transforms a list of lines of the format
      WORD [SPACE] PHONEME1 [SPACE] PHONEME2 [SPACE] PHONEME3 ...
    into a list of

ReadingData` classes for easier further processing.

>>> format_data(["ZURCHER Z ER1 K ER0", "ZUREK Z UH1 R EH0 K"])
[ReadingData(word='ZURCHER', phonemes=['Z', 'ER1', 'K', 'ER0']), ReadingData(word='ZUREK', phonemes=['Z', 'UH1', 'R', 'EH0', 'K'])]
'''
return [ReadingData(*first_and_others(x.strip().replace('\n','').split()))\
for x in raw_data \
if not x.startswith(';;;')]

def write_url_to_filename(url, filename):
response = requests.get(URL, stream=True)

if not response.ok:
raise Exception("Error writing {} to {}".format(url, filename))

with open(filename, 'wb+') as f:
for block in response.iter_content(4096):
f.write(bloc

Solution

Your use of constants is a bit buggy. First of, since you call write_url_to_filename(URL, LIBRARY), you should change

def write_url_to_filename(url, filename):
    response = requests.get(URL, stream=True)

into

def write_url_to_filename(url, filename):
    response = requests.get(url, stream=True)

i.e: use the parameter instead of the constant.

Secondly, since you only use your constants in main and pass them as parameters afterwards, I feel it would be best to have them as parameters with default value instead of constants:

def main(library='library.txt',
         url='http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b',
         phonemes=3):
    if not os.path.isfile(library):
        write_url_to_filename(url, library)
    with open(library) as f:
        data = format_data(f.readlines())
    print(find_rhymes("FORCE", data, phonemes)[:20])

I would also have named write_url_to_filename simply download_into but it's not guaranteed that it is a better name.

On a stylistic note, you make extensive use of \ in your list-comprehensions where implicit line continuation are at play. You can drop them, it won't change the meaning.

You also seem to indent them one extra space after each line, try to avoid that.

find is just applying a filter and taking the first element. Since you are using Python 2, you may be insterested in itertools.ifilter instead of filter. The former being an iterator whereas the latter returns a list.

Not that your approach is bad per se, but you might want to compare the two, performance wise:

from itertools import ifilter

def find(predicate, iterable):
    return next(ifilter(predicate, iterable))

When building your ReadingData, your x.strip().replace('\n','').split()) can be replaced by x.split() as split without arguments has extra logic to clean-up a string while performing the split.

I would also change your helper function into:

def clean_line(line):
    data = line.split()
    return ReadingData(data[0], data[1:])

so that:

-
you can simplify format_data:

def format_data(raw_data):
    return [clean_line(x) for x in raw_data if not x.startswith(';;;')]

-
you let the possibility to the writer of the question that led to this code to write, in Python 3:

def clean_line(line):
    word, *phonemes = line.split()
    return ReadingData(word, phonemes)

which is even more explicit

-
You can use it directly in your with block without having to read the whole file into memory first:

def main(library='library.txt',
         url='http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b',
         phonemes=3):
    if not os.path.isfile(library):
        write_url_to_filename(url, library)
    with open(library) as f:
        data = [clean_line(line) for line in f if not line.startswith(';;;')]
    print(find_rhymes("FORCE", data, phonemes)[:20])

Alternatively, you could just pass the file f to format_data instead of the list of lines. It will work without modifications and will also let you build the data as you read the file instead of reading the file first and then building data.

Code Snippets

def write_url_to_filename(url, filename):
    response = requests.get(URL, stream=True)

def write_url_to_filename(url, filename):
    response = requests.get(url, stream=True)

def main(library='library.txt',
         url='http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b',
         phonemes=3):
    if not os.path.isfile(library):
        write_url_to_filename(url, library)
    with open(library) as f:
        data = format_data(f.readlines())
    print(find_rhymes("FORCE", data, phonemes)[:20])

from itertools import ifilter

def find(predicate, iterable):
    return next(ifilter(predicate, iterable))

def clean_line(line):
    data = line.split()
    return ReadingData(data[0], data[1:])

Context

StackExchange Code Review Q#140066, answer score: 5

Revisions (0)

No revisions yet.