snippetjavascriptMinor
Generate random letter by frequency
Viewed 0 times
randomgeneratefrequencyletter
Problem
Which method below should I use to generate a "random letter by frequency" using Javascript?
From https://gist.github.com/furf/2413792
My re-write
From https://gist.github.com/furf/2413792
var randomAtoZ = (function (lookup) {
return function () {
var random = Math.random() * 100000,
letter;
for (letter in lookup) {
if (random < lookup[letter]) {
return letter;
}
}
}
})({
// Ranges calculated from data found at
// http://en.wikipedia.org/wiki/Letter_frequency
a: 8167, b: 9659, c: 12441, d: 16694,
e: 29396, f: 31624, g: 33639, h: 39733,
i: 46699, j: 46852, k: 47624, l: 51649,
m: 54055, n: 60804, o: 68311, p: 70240,
q: 70335, r: 76322, s: 82649, t: 91705,
u: 94463, v: 95441, w: 97801, x: 97951,
y: 99925, z: 100000
});My re-write
function randomAtoZ() {
// Ranges calculated from data found at
// http://en.wikipedia.org/wiki/Letter_frequency
var lookup = {
a: 8167, b: 9659, c: 12441, d: 16694,
e: 29396, f: 31624, g: 33639, h: 39733,
i: 46699, j: 46852, k: 47624, l: 51649,
m: 54055, n: 60804, o: 68311, p: 70240,
q: 70335, r: 76322, s: 82649, t: 91705,
u: 94463, v: 95441, w: 97801, x: 97951,
y: 99925, z: 100000
};
var random = Math.random() * 100000;
for (letter in lookup) {
if (lookup[letter] > random) {
return letter;
}
}
}Solution
Focusing on performance, I recommend to split the lookup table into an array of letters and an array of cumulative relative frequencies.
Iterating and accessing compact array elements is much faster compared to accessing object properties.
Also, in contrast to array iteration, the
Moving the lookup table out of the function body into either a closure or a module level constant gives you a speed advantage at least for non-optimizing JavaScript compilers.
A binary search strategy would find the letter corresponding to a random frequency value in O(log n) steps compared to O(n) steps for the current linear search. However, the constant binary search overhead is too big and the array small enough that the linear search performs better in practice.
I get the following performance test results in Firefox 53:
See also the performance test given by @hjpotter92 in the comments.
Here is the winning linear array search module implementation:
Iterating and accessing compact array elements is much faster compared to accessing object properties.
Also, in contrast to array iteration, the
for..in iteration order of object properties is not guaranteed and thus less robust (as pointed out by Mike Brant in the comments).Moving the lookup table out of the function body into either a closure or a module level constant gives you a speed advantage at least for non-optimizing JavaScript compilers.
A binary search strategy would find the letter corresponding to a random frequency value in O(log n) steps compared to O(n) steps for the current linear search. However, the constant binary search overhead is too big and the array small enough that the linear search performs better in practice.
I get the following performance test results in Firefox 53:
Test: Ops/sec:
Original 1,787,627 94% slower
Linear array search 31,399,828 fastest
Binary array search 28,691,914 9% slowerSee also the performance test given by @hjpotter92 in the comments.
Here is the winning linear array search module implementation:
/** English language alphabet. */
export const letters = [
'a', 'b', 'c', 'd',
'e', 'f', 'g', 'h',
'i', 'j', 'k', 'l',
'm', 'n', 'o', 'p',
'q', 'r', 's', 't',
'u', 'v', 'w', 'x',
'y', 'z'
];
/**
* English language cumulative letter frequencies × 100000.
* @see http://en.wikipedia.org/wiki/Letter_frequency
*/
export const frequencies = [
8167, 9659, 12441, 16694,
29396, 31624, 33639, 39733,
46699, 46852, 47624, 51649,
54055, 60804, 68311, 70240,
70335, 76322, 82649, 91705,
94463, 95441, 97801, 97951,
99925, 100000
];
/**
* Return a random letter from a - z according to their relative
* frequencies in the English language.
*
* @return {string} random letter from a - z.
*/
export default function randomAtoZ() {
let random = Math.random() * 100000;
for (let i = 0, length = letters.length; i < length; i++) {
if (random < frequencies[i]) {
return letters[i];
}
}
}Code Snippets
Test: Ops/sec:
Original 1,787,627 94% slower
Linear array search 31,399,828 fastest
Binary array search 28,691,914 9% slower/** English language alphabet. */
export const letters = [
'a', 'b', 'c', 'd',
'e', 'f', 'g', 'h',
'i', 'j', 'k', 'l',
'm', 'n', 'o', 'p',
'q', 'r', 's', 't',
'u', 'v', 'w', 'x',
'y', 'z'
];
/**
* English language cumulative letter frequencies × 100000.
* @see http://en.wikipedia.org/wiki/Letter_frequency
*/
export const frequencies = [
8167, 9659, 12441, 16694,
29396, 31624, 33639, 39733,
46699, 46852, 47624, 51649,
54055, 60804, 68311, 70240,
70335, 76322, 82649, 91705,
94463, 95441, 97801, 97951,
99925, 100000
];
/**
* Return a random letter from a - z according to their relative
* frequencies in the English language.
*
* @return {string} random letter from a - z.
*/
export default function randomAtoZ() {
let random = Math.random() * 100000;
for (let i = 0, length = letters.length; i < length; i++) {
if (random < frequencies[i]) {
return letters[i];
}
}
}Context
StackExchange Code Review Q#162558, answer score: 4
Revisions (0)
No revisions yet.