HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

Finding anagrams - Empires strike back

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
strikeempiresanagramsbackfinding

Problem

Recently as part of a job interview I was given the following coding task:


Attached is a gzipped text file of about 100,000 English words. Write a
Python, C or C++ program that finds all the anagrams of the word
"empires" that exist in the words file (just a reminder: an anagram is
the result of rearranging the letters of a word to form a new word).


Your program should ideally take only a few seconds at most to run on modern
hardware. The only inputs to your program should be the "empires" string
and the words file. Send back your Python source files. Your solution
should reflect the quality of work you would deliver to a customer, so
keep in mind things like unit/acceptance tests, comments, PEP8, etc.

I submitted two files; one with the code, and one with tests (code below). I didn't pass on to the next stage of the interview, but can't figure out what, if anything, is wrong with my code. Any critiques would be appreciated.

anagrams.py:

```
#!/usr/bin/env python
"""Find anagrams in a file.

Given a term and a filename, find the anagrams of the term within the file.
An anagram must use each of the letters in the term exactly once.
"""

import sys

def usage():
"""Print a usage message."""
print "Usage:"
print "anagrams.py "

def count_letters(word):
"""Count the occurrances of each letter in a word."""
counts = {}
for letter in word:
if letter in counts.keys():
counts[letter] += 1
else:
counts[letter] = 1
return counts

def main(argv=None):
"""Handle arguments and execute."""
if argv is None:
argv = sys.argv
if len(argv) != 3:
usage()
return 2

term = argv[1]
wordfilename = argv[2]
try:
grams = find_anagrams(term, wordfilename)
report_grams(grams, term, wordfilename)
return 0
except IOError:
print "Unable to open %s" % wordfilename
return 2

def find_anagrams(term, wordfilename):

Solution

You know, Your code is good.... the algorithm is not the best, but the code is well presented, clear, and really, it's good. For an interview I would personally have ranked you as "good attention to detail, well structured, consistent, good".

Then, I would have asked you why you chose the algorithm you used. let's break down your process:

  • pre-process the 'term', in to it's length, and the count of each letter.



  • read the entire words file



  • discard wrong-sized words



  • check that each letter in the right-size word is in the term



  • make sure the letters appear the right number of times



  • report the matches.



There are two things I would point out in that general process, though.

  • you should probably have used a generator (yielded the grams, instead of appending them to a list). A generator would have identified you as a better candidate



  • hmmm.... there really isn't a 2. Let me just repeat 1. Use a generator.



As a general observation, your comments are very comprehensive, and this may sway people in different ways. I looked at them and thought "tl;dr", and read the code instead. I still have not read your comments.
Algorithm

This is likely what would be noticed the most. Your algorithm is both advanced, and also not ideal. For an interview, I really would not hold your answer against you, but, for your interest, if you sort the letters in the term, and then sort the letters in the same-length words, if the resulting strings are the same, they are anagrams. This is much simpler than the dictionary, and set approach.

Note, I would not hold that against you in an interview, but others may. I would look at your implementation and think: huh, they've demonstrated good use of various data structures, and, if I suggested sorting the term/words I am pretty sure they could fix it in no time....
Good things

To expand on the good things:

  • you handle input arguments well



  • you handle exceptions (IO problems), and even have exit codes



  • your documentation is on all methods



  • variables and methods have good names



  • you do File IO properly



  • you have test cases

Context

StackExchange Code Review Q#88132, answer score: 17

Revisions (0)

No revisions yet.