HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

Learn Python the Hard Way #48 — lexicon exercise

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thewayhardexercisepythonlearnlexicon

Problem

I'm trying to get some feedback on my code for the lexicon exercise (#48) in Learn Python the Hard Way. I have only recently started learning Python and want to make sure I have my fundamentals down before progressing. I have some experience coding (only with kdb+/q in my work environment) but none in Python, so all advice is welcomed.

verb=['go','kill','eat']
direction=['north','south','east','west']
noun=['bear','princess']
stop=['the','in','of']
vocab={'verb':verb,'direction':direction,'noun':noun,'stop':stop}

def scan(sentence):
    wordlist=sentence.split()
    result=[] #initialize an empty list
    for word in wordlist:
        found=False
        for key,value in vocab.items():
            if word.lower() in value:  #convert to lower case so that we can handle inputs with both cases
                result.append((key,word))
                found=True
                break
        if not found:
            try:
                word=int(word)
                result.append(('number',word))
            except ValueError:
                result.append(('error',word))
    return result


Python version 2.6.

Solution

verb=['go','kill','eat']
direction=['north','south','east','west']
noun=['bear','princess']
stop=['the','in','of']


There is not a lot of point in storing these in separate variables only to stick them inside vocab just put the list literals inside vocab

vocab={'verb':verb,'direction':direction,'noun':noun,'stop':stop}


Python convention say to make constant ALL_CAPS. I'd also not abbreviate the names. Code is read more then written, so make it easy to read not to write.

def scan(sentence):
    wordlist=sentence.split()
    result=[] #initialize an empty list


Pointless comment. Assume you reader understand the language. You don't need to explain what [] means.

for word in wordlist:


I'd use for word in sentence.split():

found=False


Boolean logic flags are delayed gotos. Avoid them when you can

for key,value in vocab.items():
            if word.lower() in value:  #convert to lower case so that we can handle inputs with both cases


Your dictionary is backwards. It maps from word types to lists of words. It'd make more sense to have a dictionary mapping from word to the word type.

result.append((key,word))
                found=True
                break
        if not found:


Rather then this, use an else block on the for loop. It'll be execute if and only if no break is executed.

try:
                word=int(word)
                result.append(('number',word))


Put this in the else block for the exception. Generally, try to have as little code in try blocks as possible. Also I wouldn't store the result in word again, I'd put in a new local.

except ValueError:
                result.append(('error',word))

    return result


My approach:

WORD_TYPES = {
   'verb' : ['go', 'kill', 'eat'],
   'direction' : ['north', 'south', 'east', 'west'],
   'noun' : ['bear', 'princess'],
   'stop' : ['the','in','of']
}
# invert the dictionary
VOCABULARY = {word: word_type for word_type, words in WORD_TYPES.items() for word in words}

def scan(sentence):
    tokens = []
    for word in sentence.split():
        try:
            word_type = VOCABULAR[word]
        except KeyError:
            try:
                 value = int(word)
            except ValueError:
                 tokens.append( ('error',word) )
            else:
                 tokens.append( ('int', value) )
        else:
            tokens.append( (word_type, word) )
     return tokens


Alternately, using some regular expressions:

classifications = []
for word_type, words in WORD_TYPES.items():
    word_expression = '|'.join("(?:%s)" % re.escape(word) for word in words)
    expression = r"\b(?P%s)\b" % (word_type, word_expression)
    classifications.append(expression)
classifications.append(r"\b(?P\d+)\b")
classifications.append(r"\b(?P\w+)\b")
parser = re.compile('|'.join(classifications))

def scan(sentence):
    return [(match.lastgroup, match.group(0)) for match in parser.finditer(sentence)]


EDIT

If your version of python is to old to use the dict comphrensions you can use:

dict((word, word_type) for word_type, words in WORD_TYPES.items() for word in words)

Code Snippets

verb=['go','kill','eat']
direction=['north','south','east','west']
noun=['bear','princess']
stop=['the','in','of']
vocab={'verb':verb,'direction':direction,'noun':noun,'stop':stop}
def scan(sentence):
    wordlist=sentence.split()
    result=[] #initialize an empty list
for word in wordlist:
found=False

Context

StackExchange Code Review Q#14238, answer score: 10

Revisions (0)

No revisions yet.