HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonModerate

Read file with over 300k words and filter through appropriate filter to return list of matching words

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filereadwith300kwordsreturnappropriatefilterthroughand

Problem

As a puzzle we were asked to bring out the possibilities in breaking a 4 letter wordlock combination. It was to be a four letter word from the English dictionary. There were 10 possible characters in each place. So \$10^4\$ possibilities of random words. I brute forced my code to return a list of possible words by first reading a file of 300k English words and storing it in a list. Then I applied a filter at each place to narrow down the list of possible words that could work on the combo.

I know I am brute forcing the code down to 1089 possibilities, did I overlook other possibilities? I am a beginner and if I wanted to learn how to improve the efficiency of my code and make it simpler/functioning, what steps can I take?

```
filterlist2 = ['a','r','t','h','i','v','o','y','l','e',]
filterlist3 = ['a','l','t','o','i','n','s','r','m','f',]
filterlist4 = ['a','m','d','k','e','s','x','p','l','y',]

#filterlist1 basically before I realized I can store parameters in a list
lines = tuple(open("words.txt", 'r'))
char1filter=[]
for element in lines:
if len(element)==5:
if element.startswith('b') is True:
char1filter.append(element)
elif element.startswith('p') is True:
char1filter.append(element)
elif element.startswith('t') is True:
char1filter.append(element)
elif element.startswith('s') is True:
char1filter.append(element)
elif element.startswith('m') is True:
char1filter.append(element)
elif element.startswith('d') is True:
char1filter.append(element)
elif element.startswith('c') is True:
char1filter.append(element)
elif element.startswith('g') is True:
char1filter.append(element)
elif element.startswith('f') is True:
char1filter.append(element)
elif element.startswith('l') is True:
char1filter.append(element)

#print(char1filter)

print(len(char1filter))
#returned 393

Solution

First off I'd change how you are making char1filter.
Here's what I'd do differently:

  • Remove the is True as it's implied.



  • Change the element.startswith('') to element[0] == 'a'.



  • Merge all the element checks into one, using in. element[0] in 'bptsmdcgfl'.



  • Use a list comprehension.



Merging all this together we should get:

char1filter = [
    element
    for element in lines
    if len(element) == 5 and element[0] in 'bptsmdcgfl'
]


Id however store 'bptsmdcgfl' with filterlist2, filterlist3 and filterlist4.
And instead just remove all element that aren't 5 long.

five_long = [
    element
    for element in lines
    if len(element) == 5
]


This leaves you with four roughly identical pieces of code.
And so I'd change all the filterlists to be a single list to loop through.
And then perform the transformations inside this.

data = five_long
for to_filter in filter_lists:
    data = [
        element
        for element in data
        if element[0] in to_filter
    ]
    print(data)
    print(len(data))


If however you just want the last filter list, then you can merge all the filters into one, and then do a single filter on that.

filter_list = [e for l in filter_list for e in l]
data = [
    element
    for element in five_long
    if element[0] in filter_list
]
print(data)
print(len(data))


You should also read PEP8 to style your code correctly, like I have above, which makes your code much easier to read.
You should also use with when opening files, tuple(open('')) is bad and should instead be:

with open('words.txt', 'r') as f:
    lines = tuple(f)


And so the code could be:

FILTER_LISTS = [
    ['b', 'p', 't', 's', 'm', 'd', 'c', 'g', 'f', 'l'],
    ['a', 'r', 't', 'h', 'i', 'v', 'o', 'y', 'l', 'e'],
    ['a', 'l', 't', 'o', 'i', 'n', 's', 'r', 'm', 'f'],
    ['a', 'm', 'd', 'k', 'e', 's', 'x', 'p', 'l', 'y'],
]

with open('words.txt', 'r') as f:
    lines = tuple(f)

data = [
    element
    for element in lines
    if len(element) == 5
]
for to_filter in FILTER_LISTS:
    data = [
        element
        for element in data
        if element[0] in to_filter
    ]
    print(data)
    print(len(data))

Code Snippets

char1filter = [
    element
    for element in lines
    if len(element) == 5 and element[0] in 'bptsmdcgfl'
]
five_long = [
    element
    for element in lines
    if len(element) == 5
]
data = five_long
for to_filter in filter_lists:
    data = [
        element
        for element in data
        if element[0] in to_filter
    ]
    print(data)
    print(len(data))
filter_list = [e for l in filter_list for e in l]
data = [
    element
    for element in five_long
    if element[0] in filter_list
]
print(data)
print(len(data))
with open('words.txt', 'r') as f:
    lines = tuple(f)

Context

StackExchange Code Review Q#137990, answer score: 11

Revisions (0)

No revisions yet.