HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Check consistency of a list of statements, with fuzzy rhyme matching

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
withstatementsrhymeconsistencylistcheckmatchingfuzzy

Problem

I made a program today that solves the following problem on a programming competition site (open.kattis.com): Check if all statements with the following format X is Y or X not Y are consistent. X and Y are words of the length 1-20 and a-z.

The catch is that every word that rhymes with X in any provided statement is said to be equivalent to X. For simplicity it is said that two words rhyme if the last min(3, |X|, |Y|) character are the same, so for example foo and boo would be the same word but not skoo and troo). Also words which do not appear in any statement do not exist, and should not be considered rhyming with anything. For example, the two words foo and moo do not rhyme with each other, unless the word oo or o exists in the input.

So to break it down, here are the rules:

  • Check if all statements of the form "X is Y" or "X not Y" is consistent.



  • Two words rhyme if the last min(3, |X|, |Y|) characters are the same. So if this condition holds for two words X and Y that are in the list of statements, it is equivalent to having a statement "X is Y".



  • Words which do not appear in any statement do not exist, and should not be considered rhyming with anything



The first line consists of an integer 0≤N≤100000, the number of statements. The next N lines consists of statements of the two given forms.

Output should the string "yes", if the statements are consistent with each other or if there is a contradiction the output should be "wait what?".

The problem provides 4 sample inputs and corresponding outputs:

4
herp is derp
derp is herp
herp is herp
derp is derp


Output to the input above should be yes.

3
oskar not lukas
oskar is poptart
lukas is smart


… should output wait what?

1
moo not foo


… should output yes

2
moo not foo
oo is blah


… should output wait what?

Here is my program that should hopefully be commented well enough to understand:

```
import sys
# read all input
indata = sys.stdin.readlines()
inlen = in

Solution

General remarks

-
Organizing your code into functions is a good thing. Unfortunately, you only have one function, any_equals(), and its definition is buried in the middle of a sea of sequential statements, which makes it weird and hard to find. As it turns out, your function

# check if any word in eq1 matches any word in eq2
def any_equals(eq1, eq2):
    for i in eq1:
        for j in eq2:
            if i == j:
                return True
    return False


… would be better written as

def any_equals(eq1, eq2):
    """Check if any element of eq1 equals any element of eq2"""
    return not(set.isdisjoint(set(eq1), set(eq2)))


… with a docstring, and a data structure that takes advantage of hashing to avoid O(n2) performance. With an implementation that is so succinct, it might not be worth defining as a function after all.

So, basically, you have no functions in your code. It's hard to analyze eighty-something lines of code all at once without having it broken down into smaller self-contained chunks. There is also no way to get an overview of how your program works.

-
Code should be compartmentalized and composable. This is just a corollary to the first point. Say, hypothetically, that we no longer cared about rhyming. With a well designed program, you should be able to easily disable the code that implements rhyming. That's not the case with your code — your equivalent dictionary is mentioned all over the place.

-
Trivial special cases aren't worth the trouble. Why bother writing this?

# return yes if the first line was 0
if inlen == 0:
    print("yes")
    exit(0)


It's a very unlikely case — basically dead code. Furthermore, the existence of the special case makes it hard for you to verify whether your code works in the general case.

A more interesting special case, if you wanted to code it, is that any ruleset with no "not" rules is automatically consistent.

-
Avoid parsing the input repeatedly. You have not just two, but three loops that look like this:

for i in range(inlen):
    item = indata[i+1]
    parts = item.split(" ")
    X = parts[0].strip()
    Y = parts[2].strip()


That means that you failed to parse the input into a good representation. You should be able to read the input in only one pass.

Algorithms and efficiency

-
Building the equivalent structure for rhymes

This is the code you use to read the rules and note any rhyming equivalence:

# build lists of words in data
for i in range(inlen):
    item = indata[i+1]
    parts = item.split(" ")
    word1 = parts[0].strip()
    word2 = parts[2].strip()
    Xs.append(word1)
    Ys.append(word2)
    words.append(word1)
    words.append(word2)

    # add word in equivalent list if not present
    if word1 not in equivalent:
        equivalent[word1] = [word1]
    if word2 not in equivalent:
        equivalent[word2] = [word2]

# check which words rhyme and add to equivalent list
for i in range(len(words)):
    word1 = words[i].strip()
    for j in range(len(words)):
        word2 = words[j]
        if word1 == word2:
            continue
        endinglen = min(3, len(word1), len(word2)) # min(3, |X|, |Y|)
        ending1 = word1[-endinglen:]
        ending2 = word2[-endinglen:]
        if ending1 == ending2:
            equivalent[word1].append(word2)
            equivalent[word2].append(word1)


The result, I believe, is a dictionary, whose keys are every word encountered, and whose values are a list of words that rhyme with it. That's not what you really want, though. You want a dictionary that maps each word to the shortest word that rhymes with it. In other words, you want one canonical representation of each word; you don't want to expand each word into more alternative representations.

You can actually take a shortcut here. Only the last three letters of each word matter. You might as well immediately ignore the first few characters of any longer word. One way to accomplish would be using a regular expression match, instead of splitting on spaces.

Note that the second loop above, with i and j, is O(N2).

As a nitpick: word1 = words[i].strip() is not appropriate. The time for sanitizing the input is when you read it. You already stripped them earlier anyway. And if you had done item.split(), it would have split on any whitespace, such that you wouldn't even have to strip anything.

-
Building the statements structure

# build the statements data structure
statements = {}
for i in range(inlen):
    item = indata[i+1]
    parts = item.split(" ")
    X = parts[0].strip()
    b = True if parts[1] == "is" else False
    Y = parts[2].strip()
    if X not in statements:
        statements[X] = []

    # this tuiple is of the format (boolean, word) where boolean means if X is equal to or not equal to Y
    statements[X].append((b, Y))


statements[X].append((b, Y)) results in a weird data structure. The most important aspect of a statement is

Code Snippets

# check if any word in eq1 matches any word in eq2
def any_equals(eq1, eq2):
    for i in eq1:
        for j in eq2:
            if i == j:
                return True
    return False
def any_equals(eq1, eq2):
    """Check if any element of eq1 equals any element of eq2"""
    return not(set.isdisjoint(set(eq1), set(eq2)))
# return yes if the first line was 0
if inlen == 0:
    print("yes")
    exit(0)
for i in range(inlen):
    item = indata[i+1]
    parts = item.split(" ")
    X = parts[0].strip()
    Y = parts[2].strip()
# build lists of words in data
for i in range(inlen):
    item = indata[i+1]
    parts = item.split(" ")
    word1 = parts[0].strip()
    word2 = parts[2].strip()
    Xs.append(word1)
    Ys.append(word2)
    words.append(word1)
    words.append(word2)

    # add word in equivalent list if not present
    if word1 not in equivalent:
        equivalent[word1] = [word1]
    if word2 not in equivalent:
        equivalent[word2] = [word2]

# check which words rhyme and add to equivalent list
for i in range(len(words)):
    word1 = words[i].strip()
    for j in range(len(words)):
        word2 = words[j]
        if word1 == word2:
            continue
        endinglen = min(3, len(word1), len(word2)) # min(3, |X|, |Y|)
        ending1 = word1[-endinglen:]
        ending2 = word2[-endinglen:]
        if ending1 == ending2:
            equivalent[word1].append(word2)
            equivalent[word2].append(word1)

Context

StackExchange Code Review Q#156854, answer score: 4

Revisions (0)

No revisions yet.