patternjavascriptTip
Reservoir sampling selects k items uniformly from a stream of unknown size
Viewed 0 times
reservoir samplingrandom samplingstreamuniform distributionalgorithm ronline algorithm
Problem
You need to randomly sample k items from a stream where the total size is unknown or too large to store. You cannot go back to earlier items. Naive approach (buffer everything, then sample) uses O(n) memory.
Solution
Reservoir sampling (Algorithm R): fill the reservoir with the first k items. For each subsequent item i (1-indexed from k+1), generate a random integer j in [1, i]. If j <= k, replace reservoir[j] with the new item. Each item has exactly k/n probability of being in the final reservoir.
Why
The invariant: after processing i items, each of the i items has k/i probability of being in the reservoir. Mathematical induction proves uniformity. This is critical for fair sampling in streaming analytics, A/B testing, and log sampling.
Gotchas
- Algorithm L is faster (O(k + k*log(n/k))) by computing skip distances — use for large streams
- For k=1 (pick one random item from stream), the condition simplifies to replacing with probability 1/i
- Weighted reservoir sampling requires a different algorithm (Efraimidis-Spirakis)
Code Snippets
Reservoir sampling for k items and single item
function reservoirSample(stream, k) {
const reservoir = [];
let i = 0;
for (const item of stream) {
i++;
if (i <= k) {
reservoir.push(item);
} else {
const j = Math.floor(Math.random() * i); // [0, i)
if (j < k) reservoir[j] = item;
}
}
return reservoir;
}
// Single item pick (k=1)
function pickRandom(stream) {
let chosen, i = 0;
for (const item of stream) {
i++;
if (Math.random() < 1/i) chosen = item;
}
return chosen;
}Context
Sampling from data streams, large files, database cursors, or unknown-size sequences
Revisions (0)
No revisions yet.