principlejavascriptTip
Bloom filter: probabilistic set membership with zero false negatives
Viewed 0 times
bloom filterprobabilisticmembership testbit arrayfalse positivespace efficient
Problem
Checking membership in a huge set (billions of URLs, passwords) has prohibitive memory costs with a real Set. Sometimes you only need to know 'definitely not in set' vs 'probably in set'.
Solution
A bloom filter uses a bit array and k independent hash functions. Insert: set k bits. Query: check all k bits — if any is 0, definitely not in set; if all are 1, probably in set (false positive possible). Never deletes. Memory: O(m) bits where m is tunable by desired false positive rate.
Why
A bloom filter with 1% false positive rate uses about 10 bits per element, versus 64 bits for a 64-bit integer in a set. Trade accuracy for 6x+ memory savings. False negatives are impossible by construction.
Gotchas
- Cannot delete elements (use counting bloom filter for deletions, at higher memory cost)
- False positive rate grows as elements are added beyond design capacity
- Hash function quality matters — use MurmurHash or xxHash, not built-in JS hash
Code Snippets
Simple bloom filter with two hash functions
// Minimal bloom filter (uses two simple hashes for illustration)
class BloomFilter {
constructor(size = 1000) {
this.bits = new Uint8Array(size);
this.size = size;
}
_hashes(str) {
let h1 = 0, h2 = 5381;
for (let i = 0; i < str.length; i++) {
const c = str.charCodeAt(i);
h1 = (h1 * 31 + c) % this.size;
h2 = ((h2 << 5) + h2 + c) % this.size;
}
return [Math.abs(h1), Math.abs(h2)];
}
add(str) { this._hashes(str).forEach(i => (this.bits[i] = 1)); }
has(str) { return this._hashes(str).every(i => this.bits[i] === 1); }
}Context
Checking membership in huge datasets where occasional false positives are acceptable
Revisions (0)
No revisions yet.