HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Exercism assignment word-count in Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
assignmentwordpythoncountexercism

Problem

Can someone review my code which I put on exercism.io for the word-count exercise (found here)?

class Phrase:
    def __init__(self, phrase):
        self.phrase = phrase.strip().lower().split()

    def word_count(self):
        word_dict = {}
        for word in self.phrase:
            word_input = ''.join(e for e in word if e.isalnum()).strip()
            if word_input:
                word_dict[word_input] = word_dict.get(word_input, 0) + 1
        return word_dict

Solution

My first advice is stop writing classes if you don't need to : your class has only two methods an init and a proper method. This could probably be written as a simple function.

def word_count(phrase):
    word_dict = {}
    for word in phrase.strip().lower().split():
        word_input = ''.join(e for e in word if e.isalnum()).strip()
        if word_input:
            word_dict[word_input] = word_dict.get(word_input, 0) + 1
    return word_dict


Also, a lot of string processing you are doing is not useful, complicated and/or potentially wrong :

  • On phrase, as noticed by Gareth Rees, there's not point calling strip() since you call split().



As I wasn't aware that the result would be the same, here a proof of concept :

a='  this  is    a little      test'
a.split() == a.strip().split()
-> True


and here's a link to the documentation for strip and split.

  • On individual words : the way you get word_input from word looks a bit convoluted to me. Among other things, there's no point in calling strip() on a string that only contains alphanumerical characters. Also, just removing "special" characters does not sound like the best option : it's and its will be considered the same way, its annoying :-). Maybe some characters like ' should be taken into account during the splitting. Maybe, depending on the language you are to handle, other characters like - should be kept (for instance in French, "basketball" is "basket-ball" but neither "basket" nor "ball" are French words so splitting would be quite awkward and so would be removing the dash).



Except for that, your code looks go to me!

However, if you wanted to make things even more pythonic, you could use defaultdict. This example will probably look familar to you.

Code Snippets

def word_count(phrase):
    word_dict = {}
    for word in phrase.strip().lower().split():
        word_input = ''.join(e for e in word if e.isalnum()).strip()
        if word_input:
            word_dict[word_input] = word_dict.get(word_input, 0) + 1
    return word_dict
a='  this  is    a little      test'
a.split() == a.strip().split()
-> True

Context

StackExchange Code Review Q#41941, answer score: 5

Revisions (0)

No revisions yet.