HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Use up all characters to form words

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
allwordscharactersformuse

Problem

Task:


Given a dictionary of words, and a set of characters, judge if all the
characters can form the words from the dictionary, without any
characters left. For example, given the dictionary {hello, world, is,
my, first, program}, if the characters set is "iiifrssst", you should
return 'true' because you can form {is, is, first} from the set; if
the character set is "eiifrsst", you should return 'false' because you
cannot use all the characters from the set.


P.S. there may be tens of thousands of words in the dictionary, and
the chars set length could be up to hundreds, so I really need some
efficient algorithm.

Source: http://www.careercup.com/question?id=5841660122497024

Here is my solution in Python:

```
from collections import Counter
from itertools import combinations_with_replacement
import copy

# Check if word can be made up with the characters in char_count
def valid_word(word):
# build character count for word
word_char_count = Counter(word)

# check if character count for word
for char in word_char_count.keys():
if char not in char_count or word_char_count[char] > char_count[char]:
return False

return True

# Only return words that can be made up from characters with char_set
def prune_words(words):
res = []
for word in words:
if valid_word(word):
res.append(word)

return res

def main(words, char_set):
global char_count
char_count = Counter(char_set) # count character occurrences

# Prune words that cannot possible be made from the set of characters
words = prune_words(words)
words_len = [len(word) for word in words]

# Maximum number of r with generating combinations
max_r = len(char_set) / min(words_len)

# Generate permutations with the valid words
for r in range(1, max_r + 1):
for comb in combinations_with_replacement(words, r):
# If the total length of the combination matches length of char_

Solution


  • Since "the chars set length could be up to hundreds", it can easily contain enough letters to form any word in an English dictionary. In such a case prune_words would not prune any words. This makes me think the pruning is an example of premature optimization.



  • Instead of using Counter to check a match, I believe it would be faster to compare sorted lists of characters, i.e. compare sorted("".join(comb)) to a pre-computed sorted(char_set).



  • Copying a Counter by Counter(char_count) is faster than by copy.deepcopy.



  • To check if all counts are zeros you don't need the list comprehension as you can simply use if not any(temp_count.itervalues())



  • You check lots of combinations that are the wrong length. Even though the length check is the first thing inside the loop, it would be better to generate the combinations in a way that allows you to back off when the combined length exceeds the length of char_set. I would use recursion to explore the combinations.



  • A more clever approach might be found from representing the dictionary as a prefix tree.

Context

StackExchange Code Review Q#45733, answer score: 4

Revisions (0)

No revisions yet.