HiveBrain v1.2.0
Get Started
← Back to all entries
snippetjavascriptMinor

Generate random letter by frequency

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
randomgeneratefrequencyletter

Problem

Which method below should I use to generate a "random letter by frequency" using Javascript?

From https://gist.github.com/furf/2413792

var randomAtoZ = (function (lookup) {
    return function () {
        var random = Math.random() * 100000,
                letter;
        for (letter in lookup) {
            if (random < lookup[letter]) {
                return letter;
            }
        }
    }
})({
    // Ranges calculated from data found at
    // http://en.wikipedia.org/wiki/Letter_frequency
    a: 8167,  b: 9659,  c: 12441, d: 16694,
    e: 29396, f: 31624, g: 33639, h: 39733,
    i: 46699, j: 46852, k: 47624, l: 51649,
    m: 54055, n: 60804, o: 68311, p: 70240,
    q: 70335, r: 76322, s: 82649, t: 91705,
    u: 94463, v: 95441, w: 97801, x: 97951,
    y: 99925, z: 100000
});


My re-write

function randomAtoZ() {
    // Ranges calculated from data found at
    // http://en.wikipedia.org/wiki/Letter_frequency
    var lookup = {
        a: 8167,  b: 9659,  c: 12441, d: 16694,
        e: 29396, f: 31624, g: 33639, h: 39733,
        i: 46699, j: 46852, k: 47624, l: 51649,
        m: 54055, n: 60804, o: 68311, p: 70240,
        q: 70335, r: 76322, s: 82649, t: 91705,
        u: 94463, v: 95441, w: 97801, x: 97951,
        y: 99925, z: 100000
    };

    var random = Math.random() * 100000;

    for (letter in lookup) {
        if (lookup[letter] > random) {
            return letter;
        }
    }
}

Solution

Focusing on performance, I recommend to split the lookup table into an array of letters and an array of cumulative relative frequencies.

Iterating and accessing compact array elements is much faster compared to accessing object properties.

Also, in contrast to array iteration, the for..in iteration order of object properties is not guaranteed and thus less robust (as pointed out by Mike Brant in the comments).

Moving the lookup table out of the function body into either a closure or a module level constant gives you a speed advantage at least for non-optimizing JavaScript compilers.

A binary search strategy would find the letter corresponding to a random frequency value in O(log n) steps compared to O(n) steps for the current linear search. However, the constant binary search overhead is too big and the array small enough that the linear search performs better in practice.

I get the following performance test results in Firefox 53:

Test:                 Ops/sec:
Original               1,787,627   94% slower
Linear array search   31,399,828   fastest
Binary array search   28,691,914   9% slower


See also the performance test given by @hjpotter92 in the comments.

Here is the winning linear array search module implementation:

/** English language alphabet. */
export const letters = [
    'a', 'b', 'c', 'd',
    'e', 'f', 'g', 'h',
    'i', 'j', 'k', 'l',
    'm', 'n', 'o', 'p',
    'q', 'r', 's', 't',
    'u', 'v', 'w', 'x',
    'y', 'z'
];

/** 
 * English language cumulative letter frequencies × 100000.
 * @see http://en.wikipedia.org/wiki/Letter_frequency
 */
export const frequencies = [
    8167,  9659,  12441, 16694,
    29396, 31624, 33639, 39733,
    46699, 46852, 47624, 51649,
    54055, 60804, 68311, 70240,
    70335, 76322, 82649, 91705,
    94463, 95441, 97801, 97951,
    99925, 100000
];

/**
 * Return a random letter from a - z according to their relative
 * frequencies in the English language.
 *
 * @return {string} random letter from a - z.
 */
export default function randomAtoZ() {
  let random = Math.random() * 100000;
  for (let i = 0, length = letters.length; i < length; i++) {
    if (random < frequencies[i]) {
      return letters[i];
    }
  }
}

Code Snippets

Test:                 Ops/sec:
Original               1,787,627   94% slower
Linear array search   31,399,828   fastest
Binary array search   28,691,914   9% slower
/** English language alphabet. */
export const letters = [
    'a', 'b', 'c', 'd',
    'e', 'f', 'g', 'h',
    'i', 'j', 'k', 'l',
    'm', 'n', 'o', 'p',
    'q', 'r', 's', 't',
    'u', 'v', 'w', 'x',
    'y', 'z'
];

/** 
 * English language cumulative letter frequencies × 100000.
 * @see http://en.wikipedia.org/wiki/Letter_frequency
 */
export const frequencies = [
    8167,  9659,  12441, 16694,
    29396, 31624, 33639, 39733,
    46699, 46852, 47624, 51649,
    54055, 60804, 68311, 70240,
    70335, 76322, 82649, 91705,
    94463, 95441, 97801, 97951,
    99925, 100000
];

/**
 * Return a random letter from a - z according to their relative
 * frequencies in the English language.
 *
 * @return {string} random letter from a - z.
 */
export default function randomAtoZ() {
  let random = Math.random() * 100000;
  for (let i = 0, length = letters.length; i < length; i++) {
    if (random < frequencies[i]) {
      return letters[i];
    }
  }
}

Context

StackExchange Code Review Q#162558, answer score: 4

Revisions (0)

No revisions yet.