patternpythonMinor
Program to find dollar words
Viewed 0 times
wordsprogramdollarfind
Problem
A dollar word is a word for which the sum of the values of the letters adds up to 100 ($1.00).
"a" has a value of 1 and "z" has a value of 26. Special characters such as apostrophes are ignored.
First try looked like:
I started feeling kind of bad that words like "Hälleflinta" and "divorcée" might be denied their rightful place in the dollar word list. No known rules for how to handle characters with accents, so I made up my own as below (à would be counted as a). That means replacing all diactritics with the plain letter.
I'm posting for any feedback about this code. See any room for improvement?
"a" has a value of 1 and "z" has a value of 26. Special characters such as apostrophes are ignored.
First try looked like:
import string
valMap = {}
for index,item in enumerate(string.lowercase):
valMap[item] = index +1
def isDollarWord(word):
lowercase = word.lower().strip()
total = 0
for letter in lowercase:
if letter in valMap:
total += valMap[letter]
return total == 100
words = open("C:\Users\astroboy\Downloads\UKACD17.TXT")
for line in words:
if isDollarWord(line):
print(line)I started feeling kind of bad that words like "Hälleflinta" and "divorcée" might be denied their rightful place in the dollar word list. No known rules for how to handle characters with accents, so I made up my own as below (à would be counted as a). That means replacing all diactritics with the plain letter.
import string
import unicodedata
valMap = {}
for index,item in enumerate(string.lowercase):
valMap[item] = index +1
def remove_marks(word):
unicode_word = word.decode('cp1252')
return unicodedata.normalize('NFKD',unicode_word).encode('ascii','ignore')
def isDollarWord(word):
lowercase = word.lower().strip()
normalized = remove_marks(lowercase)
total = 0
for n in normalized:
if n in valMap:
total += valMap[n]
return total == 100
words = open("C:\Users\astroboy\Downloads\UKACD17.TXT")
for line in words:
if isDollarWord(line):
print(remove_marks(line))I'm posting for any feedback about this code. See any room for improvement?
Solution
Strategic themes
Localized issues
File handling
Internationalization and Python 3 compatibility
Suggested implementation
- Prefer comprehensions to loops: List comprehensions and dict comprehensions let you compress loops into one-liners. It's also a nice feeling to initialize an object "all at once" rather than building it little by little.
- Avoid special cases:
if n in valMapis annoying. IfvalMapwere adefaultdict, then a failed lookup would naturally have a value of 0.
- Use built-in functions: To compute a sum, use
sum(). Converting thevalMapto adefaultdictenables this further simplification.
- Obey the single-responsibility principle: The
isDollarWord()function does too much, and should be split up. Aword_value()function would be more useful thanisDollarWord()— at the least, it allows for more interesting unit tests. Onceword_value()has been defined, comparison with 100 is trivial.
Localized issues
valMapis not descriptive enough for my taste. I suggestLETTER_VALUES, in all caps to identify it as a constant.
isDollarWord()should be namedis_dollar_word(). (You namedremove_marks()correctly, though.)
File handling
- You have a file descriptor leak. Opening files using a
withblock is almost always preferable to a regularopen()call.
- Better yet, avoid hard-coding filenames, and let the input be specified on the command line or through standard input.
fileinput.input()is useful for this.
Internationalization and Python 3 compatibility
string.lowercaseis locale-dependent, which, according to your specification, you don't want. Furthermore, it has been removed in Python 3. You wantstring.ascii_lowercaseinstead.
- If you want to interpret the input as CP1252, specify an encoding when opening the file, so that it is decoded correctly even before your application even has a chance to get to the data. If using
fileinput.input(), use anopenhookparameter (but beware of a bug in Python 2).
- For Python 3 compatibility,
remove_marks()should also call.decode('ascii')on its return value, to convert the byte string back into a text string.
- Alternate ways to handle the internationalization problem include transliteration using transliterate or Unidecode. German convention, for example, treats ö as oe and ß as ss.
Suggested implementation
from collections import defaultdict
import fileinput
from string import ascii_lowercase
from unicodedata import normalize
LETTER_VALUES = defaultdict(int,
((letter, index + 1) for index, letter in enumerate(ascii_lowercase))
)
def remove_marks(word):
return normalize('NFKD', word).encode('ascii', 'ignore').decode('ascii')
def word_value(word):
return sum(LETTER_VALUES[c] for c in remove_marks(word.lower()))
for line in fileinput.input(openhook=fileinput.hook_encoded('cp1252')):
if word_value(line) == 100:
print(line.strip())Code Snippets
from collections import defaultdict
import fileinput
from string import ascii_lowercase
from unicodedata import normalize
LETTER_VALUES = defaultdict(int,
((letter, index + 1) for index, letter in enumerate(ascii_lowercase))
)
def remove_marks(word):
return normalize('NFKD', word).encode('ascii', 'ignore').decode('ascii')
def word_value(word):
return sum(LETTER_VALUES[c] for c in remove_marks(word.lower()))
for line in fileinput.input(openhook=fileinput.hook_encoded('cp1252')):
if word_value(line) == 100:
print(line.strip())Context
StackExchange Code Review Q#58446, answer score: 3
Revisions (0)
No revisions yet.