patternpythonMinor
Permutations program in Python
Viewed 0 times
programpythonpermutations
Problem
This code asks for letters and a number of letters per word and generates all possible permutations. Then compares those permutations with an English dictionary and creates a list with all the words that match with the dictionary. Finally, it prints the list on a new terminal. The problem is that when words are long (about 10 letters each) I run out of memory. Any suggestions to improve this code and to try to solve the memory problem?
EDIT: my aim is to get 'real' words after typing some letters (e.g input: wxtisneo; letters_per_word: 3; output: one, two, six...)
EDIT: my aim is to get 'real' words after typing some letters (e.g input: wxtisneo; letters_per_word: 3; output: one, two, six...)
import os
import itertools
import string
import sys
from sets import Set
def check_files():
if 'all_combinations_list.txt' in os.listdir('/tmp'):
os.remove('/tmp/real_words.txt')
def factorial_formula(v):
a = 1
if v >out, separated_words
os.system("gnome-terminal -e 'bash -c \"cat /tmp/real_words.txt; exec bash\"'")
print ''' FINISHED
'''
if __name__ == "__main__":
os.system('clear')
check_files()
letter_input = raw_input(' Type in letters: ')
number_input = int(raw_input(' Letters per word: '))
split_letters = list(letter_input)
letter_input
number_input
if ' ' in split_letters:
print '''
ERROR: White space typed
'''
sys.exit()
elif len(split_letters) < number_input:
print '''
ERROR: More letters in word than typed
'''
sys.exit()
print '''
Creating %s different combinations...''' % combinations_formula()
print ' Detecting dictionary matches...'
possible_existing_words()
sys.exit()Solution
You're first building a list of all permutations of the input word, and then using that to filter the dictionary. The problem with this approach is that the number of permutations is huge — for 10 distinct letters, there are 10! = 3628800 permutations. In comparison,
There's cheap way to check if two words are a permutation of each other: sort their letters. Two words are permutations of one another if and only if sorting their letters returns identical results.
Here, you only want to see if each dictionary word consists of
I don't think the Python standard library has a subsequence test (only a substring test, i.e. where the elements must be consecutive). A cheap way to implement it for lists of characters is to build a regular expression and let the regular expression engine build an optimized automaton internally.
That's a tiny precomputation, after which you get cheap processing on each dictionary word. You can then process the dictionary as a stream, so your program will use constant memory with respect to the dictionary size (and linear memory in the size of the target permutation, instead of factorial).
Here's a demo program where I don't copy the whole input logic, I just take the letters from the command line and the dictionary from standard input.
There are other ways to test if a word is a permutation of the
/usr/share/dict/american-english is only about 100k words, and /usr/share/dict/american-english-huge is about 340k — less than 10 times the number of permutations. This kind of pretreatment can be useful, but only if the variable data to be processed afterwards is large or if the amount of computation per data pair is very large, neither of which are the case here.There's cheap way to check if two words are a permutation of each other: sort their letters. Two words are permutations of one another if and only if sorting their letters returns identical results.
def is_a_permutation(word1, word2):
return sorted(word1) == sorted(word2)Here, you only want to see if each dictionary word consists of
number_input elements of split_letters. This is the case if the length of the word is number_input and the word is a subsequence (not necessarily consecutive) of split_letters — after sorting both the word and split_letters.I don't think the Python standard library has a subsequence test (only a substring test, i.e. where the elements must be consecutive). A cheap way to implement it for lists of characters is to build a regular expression and let the regular expression engine build an optimized automaton internally.
def subsequence_re(word):
return re.compile('.*'.join([re.escape(letter) for letter in sorted(word)]))That's a tiny precomputation, after which you get cheap processing on each dictionary word. You can then process the dictionary as a stream, so your program will use constant memory with respect to the dictionary size (and linear memory in the size of the target permutation, instead of factorial).
Here's a demo program where I don't copy the whole input logic, I just take the letters from the command line and the dictionary from standard input.
#! /usr/bin/env python
import re
import sys
def subsequence_re(word):
return re.compile('.*'.join([re.escape(letter) for letter in sorted(word)]))
def filter_file(target_length, target_re, data_file):
for line in data_file:
word = line[:-1]
if len(word) == target_length and \
re.search(target_re, ''.join(sorted(word))):
yield word
if __name__ == '__main__':
target = sys.argv[1]
for word in filter_file(len(target), subsequence_re(target), sys.stdin):
print wordThere are other ways to test if a word is a permutation of the
number_input letters among split_letters. One way is to build a dictionary (in the sense of a Python dict) where the keys are the distinct letters in split_letters and the values are the number of occurrences of each letter. To see if a word matches, build a similar dictionary and check if the number is smaller (entries not in the reference dict count as an immediate failure); note that the word must also have the target length. As a follow-up exercise, implement and benchmark this method.Code Snippets
def is_a_permutation(word1, word2):
return sorted(word1) == sorted(word2)def subsequence_re(word):
return re.compile('.*'.join([re.escape(letter) for letter in sorted(word)]))#! /usr/bin/env python
import re
import sys
def subsequence_re(word):
return re.compile('.*'.join([re.escape(letter) for letter in sorted(word)]))
def filter_file(target_length, target_re, data_file):
for line in data_file:
word = line[:-1]
if len(word) == target_length and \
re.search(target_re, ''.join(sorted(word))):
yield word
if __name__ == '__main__':
target = sys.argv[1]
for word in filter_file(len(target), subsequence_re(target), sys.stdin):
print wordContext
StackExchange Code Review Q#54957, answer score: 3
Revisions (0)
No revisions yet.