HiveBrain v1.2.0
Get Started
← Back to all entries
snippetcsharpMinor

How to improve wordfeud solver algorithm?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
improvealgorithmhowwordfeudsolver

Problem

I created a simple wordfeud solver that checks what words can be created with the supplied letters.

public void GetResults(string characters)
{
    // place all the characters we have in a dictionary with key=letter, value=amount
    var charCount = SplitCharacters(characters);

    // get all the unique letters (the keys)
    char[] individualChars = charCount.Keys.ToArray();

    // create a regex for these letters, because only those letters are allowed in a word
    string regex = @"^[" + new string(individualChars) + "]+$";

    // get all the words that contain only the letters we have on our board
    IEnumerable possibleWords = _words.Where(x => Regex.IsMatch(x, regex));

    // walk through the words that have only the letters on our board, and check if
    // there are sufficient letters on our board to create that word
    IEnumerable results = possibleWords
        .Where(x => SplitCharacters(x).All(y => charCount[y.Key] >= y.Value) && x.Length > 1)
        .OrderByDescending(x => x.Length)
        .Take(25);

    // convert the words to ViewModel objects to display in the view
    foreach (string result in results)
    {
        Results.Add(new ResultViewModel { Word = result });
    }
}

private Dictionary SplitCharacters(string characters)
{
    char[] chars = characters.ToLower().ToCharArray();

    var charCount = new Dictionary();

    foreach (char c in chars)
    {
        if (!charCount.ContainsKey(c))
        {
            charCount[c] = 1;
        }
        else
        {
            charCount[c]++;
        }
    }

    return charCount;
}


It works, but I think this can be done more efficient. I worked this out on my own, but I feel like I'm creating too many new arrays/enumerables for all the steps in the algorithm. It runs quite fast on my PC (core i7 2600) but it's a little bit slow on my Surface tablet.

Can this be done more efficiently?

Solution

This is a rather interesting problem, first let's analyze a couple of algorithms and their respective algorithmic complexities.

We say that A is the set of characters given to us from the Wordfeud game: the set of characters from which we can form words. B is the lexigraphically sorted set of all valid words we can form, in practise, the words of a dictionary. The set of words in B that we can form from the characters in A is C.

Naive algorithms

The naive algorithm for this problem is to simply loop through all elements of B and check whether these words can be formed from the characters in B, with a regex like proposed here, or with a frequency map. If so, we put the word in C. For an average word length k and n words in B this runs in worst-case O(n k) time.

This is the fastest we can go without doing some kind of precomputation. In my tests, this algorithm runs in about 2s on the dictionary in /usr/share/dict/words in Ruby on 2 Ghz. This is the mean running time from different sets of A of 25 characters of which about 15 letters are distinct.

A simple iteration on this naive algorithm, is the observation that if a character is not in A, then we do not have to loop through all the words that start with this letter in B. E.g. if there's no letter c in A, we can skip to the words beginning with d when we encounter the first word which begins with c in B and so on. We can precompute a table that stores this information, i.e. the intervals at which the words start with a letter k. E.g. words with index 1..3482 in B all start with a. Thus if A does not contain a, we can just jump to the word with index the 3482 + 1.

The speed of this iteration depends on how many words in B start with each character in A. Say that v is the highest number of words that starts with any character in B, and i is the number of distinct characters from which we can form words, then this runs in worst-case O(v i) time. In my tests, this was about twice as fast as the previous algorithm.

Another iteration is threading it. For each segment we check (e.g. the words from a..b) we create a thread that checks this. With native threads, this halves the run time once again. This got me down to about 0.5s with the above setup running on JRuby.

Using a Trie

A trie is a tree data structure to store prefixes of strings. You start with a root node, and from there you have 25 edges, one for each character in the alphabet. If you follow one of these edges, e.g. the edge for the letter 'a' there will be X new edges. For each of those edges the prefix a + that letter exists in B. For instance one of the edges in X could be p and from there we could follow another edge e.

The vertex we landed at after following e is marked, because if we follow the edges back to the root node and save the characters on the way to the root in reverse order, we get a word in B: ape. The edge p existed because ap is a prefix of ape, and thus a prefix in B, as stated in the previous definition.

With this data structure in mind, we can design a new algorithm. Build a freqency map with default value 0 from A, i.e. a map where the key is the letter in A and the value of that key is the amount of times this letter occurs in A.

Traverse the tree with a depth-first search where you recursively traverse all options that you can do while all keys in the frequency map are > 0. Every time you hit a vertex which is marked, you traverse up to the root node to determine the word and add this word to C.

In reality you might want to pay the memory cost of storing the word at the vertex so you avoid traversing back to the root node every time you hit a marked vertex.

I won't get into algorithmic complexity analysis of this one, since it is rather complicated to analyze. However, in my tests, creating the trie from the dictionary takes less than 1s (e.g. for a web-service this penalty can just be paid when starting the app) and querying with a frequency table of A takes about 0.05s.

I do not know C#, however, I implemented this algorithm in Ruby.

Context

StackExchange Code Review Q#19364, answer score: 3

Revisions (0)

No revisions yet.