HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Project Euler # 22: Names scores

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
projecteulerscoresnames

Problem

Problem 22, here, asks the following:


Using names.txt (right click and 'Save Link/Target As...'), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.


For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 × 53 = 49714.


What is the total of all the name scores in the file?

My solution is as follows:

import re
import string

def create_names_list(file_name):
    #create a names list parsing 
    names = open(file_name).read()
    names = re.findall(r'"(.*?)"', names)
    names.sort()
    return names

def name_weights(file_name):
    #fetch name-list
    names = create_names_list(file_name)

    #create a letters dictionary e.g. {'A' : 1, 'B' : 2, ... 'Z' : 26}
    letters = string.ascii_uppercase
    letters_map_reversed = dict(enumerate(letters))
    letters_map = {value: key+1 for key, value in letters_map_reversed.iteritems()}

    # sum all letters using letter score * index
    return sum(letters_map[char]*(names.index(name)+1) for name in names for char in name)



%timeit name_weights('names.txt')


1 loops, best of 3: 1.18 s per loop

I noticed in the thread for the solutions that many python solutions appeared to be in the 8-12ms range, 10x faster than my solution. As a result I was concerned with the performance of my solution.

Two things I've done to attempt to tweak my solution are changing the letters_map[char] portion to just be ord(char)-64 which I noticed a few people were using. It seems like a clever, certainly shorter, way of getting the number value that I hadn't thought of. It didn't alter performance though, which makes me think my problem is with the final expression that uses a nested for loop

Solution

Here's your problem:

return sum(letters_map[char]*(names.index(name)+1) for name in names for char in name)
                              ^^^^^^^^^^^^^^^^^^^


So as we walk down names, we calculate the value of the current name... and then start over and search for it from scratch to determine the index. That extra lookup turns our algorithm from \$O(n)\$ to \$O(n^2)\$!! And string search is expensive - for the 1000th name, we have to first check 999 other strings that are definitely going to be wrong. We don't need to search for name in names to find the index. Not only that, but then we re-search the index for every char in name! (Maybe Python is smart enough to cache this, maybe not). So really this is more like \$O(n^3)\$...

We can just keep track of the index as we go. That's what enumerate() is for:

return sum(letters_map[char] * idx
           for idx, name in enumerate(names, start=1)
           for char in name)


That one change takes your runtime over 10 iterations from 13.31s to 0.07s. Behold the power of not redoing a bunch of work.

Everything else is basically fine. Creating letters_map could be abridged to just:

letters_map = {letter: idx
               for idx, letter in enumerate(string.ascii_uppercase, start=1)}


That doesn't affect performance, just makes it a little easier to see what letters_map actually is. It turns out that dictionary lookup is fater than doing the name lookup on ord() within the loop is, so this is pretty speedy.

It's also typical to do file operations with a context manager. That is:

with open(file_name) as f:
    return sorted(re.findall(r'"(.*?)"', f.read()))


That is marginally slower, but more Pythonic in my opinion.

Code Snippets

return sum(letters_map[char]*(names.index(name)+1) for name in names for char in name)
                              ^^^^^^^^^^^^^^^^^^^
return sum(letters_map[char] * idx
           for idx, name in enumerate(names, start=1)
           for char in name)
letters_map = {letter: idx
               for idx, letter in enumerate(string.ascii_uppercase, start=1)}
with open(file_name) as f:
    return sorted(re.findall(r'"(.*?)"', f.read()))

Context

StackExchange Code Review Q#106393, answer score: 5

Revisions (0)

No revisions yet.