patternpythonMinor
Use up all characters to form words
Viewed 0 times
allwordscharactersformuse
Problem
Task:
Given a dictionary of words, and a set of characters, judge if all the
characters can form the words from the dictionary, without any
characters left. For example, given the dictionary {hello, world, is,
my, first, program}, if the characters set is "iiifrssst", you should
return 'true' because you can form {is, is, first} from the set; if
the character set is "eiifrsst", you should return 'false' because you
cannot use all the characters from the set.
P.S. there may be tens of thousands of words in the dictionary, and
the chars set length could be up to hundreds, so I really need some
efficient algorithm.
Source: http://www.careercup.com/question?id=5841660122497024
Here is my solution in Python:
```
from collections import Counter
from itertools import combinations_with_replacement
import copy
# Check if word can be made up with the characters in char_count
def valid_word(word):
# build character count for word
word_char_count = Counter(word)
# check if character count for word
for char in word_char_count.keys():
if char not in char_count or word_char_count[char] > char_count[char]:
return False
return True
# Only return words that can be made up from characters with char_set
def prune_words(words):
res = []
for word in words:
if valid_word(word):
res.append(word)
return res
def main(words, char_set):
global char_count
char_count = Counter(char_set) # count character occurrences
# Prune words that cannot possible be made from the set of characters
words = prune_words(words)
words_len = [len(word) for word in words]
# Maximum number of r with generating combinations
max_r = len(char_set) / min(words_len)
# Generate permutations with the valid words
for r in range(1, max_r + 1):
for comb in combinations_with_replacement(words, r):
# If the total length of the combination matches length of char_
Given a dictionary of words, and a set of characters, judge if all the
characters can form the words from the dictionary, without any
characters left. For example, given the dictionary {hello, world, is,
my, first, program}, if the characters set is "iiifrssst", you should
return 'true' because you can form {is, is, first} from the set; if
the character set is "eiifrsst", you should return 'false' because you
cannot use all the characters from the set.
P.S. there may be tens of thousands of words in the dictionary, and
the chars set length could be up to hundreds, so I really need some
efficient algorithm.
Source: http://www.careercup.com/question?id=5841660122497024
Here is my solution in Python:
```
from collections import Counter
from itertools import combinations_with_replacement
import copy
# Check if word can be made up with the characters in char_count
def valid_word(word):
# build character count for word
word_char_count = Counter(word)
# check if character count for word
for char in word_char_count.keys():
if char not in char_count or word_char_count[char] > char_count[char]:
return False
return True
# Only return words that can be made up from characters with char_set
def prune_words(words):
res = []
for word in words:
if valid_word(word):
res.append(word)
return res
def main(words, char_set):
global char_count
char_count = Counter(char_set) # count character occurrences
# Prune words that cannot possible be made from the set of characters
words = prune_words(words)
words_len = [len(word) for word in words]
# Maximum number of r with generating combinations
max_r = len(char_set) / min(words_len)
# Generate permutations with the valid words
for r in range(1, max_r + 1):
for comb in combinations_with_replacement(words, r):
# If the total length of the combination matches length of char_
Solution
- Since "the chars set length could be up to hundreds", it can easily contain enough letters to form any word in an English dictionary. In such a case
prune_wordswould not prune any words. This makes me think the pruning is an example of premature optimization.
- Instead of using
Counterto check a match, I believe it would be faster to compare sorted lists of characters, i.e. comparesorted("".join(comb))to a pre-computedsorted(char_set).
- Copying a Counter by
Counter(char_count)is faster than bycopy.deepcopy.
- To check if all counts are zeros you don't need the list comprehension as you can simply use
if not any(temp_count.itervalues())
- You check lots of combinations that are the wrong length. Even though the length check is the first thing inside the loop, it would be better to generate the combinations in a way that allows you to back off when the combined length exceeds the length of
char_set. I would use recursion to explore the combinations.
- A more clever approach might be found from representing the dictionary as a prefix tree.
Context
StackExchange Code Review Q#45733, answer score: 4
Revisions (0)
No revisions yet.