HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Why do Bloom filters work?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
bloomwhyworkfilters

Problem

Let's say I am using Bloom filters to create a function to check if a word exists in a document or not. If I pick a hash function to fill out a bit bucket for all words in my document. Then if for a given number of words, wouldn't the whole bit bucket be all 1s? If so then checking for any word will return true? What am I missing here?

Solution

Given that you want to insert $n$ words into the Bloom filter, and you want a false positive probability of $p$, the wikipedia page on Bloom filters gives the following formulas for choosing $m$, the number of bits in your table and $k$, the number of hash functions that you are going to use. They give
$
m = - \frac{n \ln p}{(\ln 2)^2}
$
and
$$
k = \frac{m}{n}\ln 2=-\frac{\ln p}{\ln 2}=-\lg_2p,
$$
so you should choose
$$
m=\frac{nk}{\ln 2}.
$$

That actually works out quite nicely. You are going to get a table with about half the bits set and half cleared, so the entropy per bit is going to be maximal, and the probability of a false positive is going to be
$0.5^k$.

Context

StackExchange Computer Science Q#12834, answer score: 6

Revisions (0)

No revisions yet.