HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Why do we need "Bloom Filters" if we can use hash tables?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
whycantablesneedbloomhashfiltersuse

Problem

A Bloom filter is a probabilistic data structure designed to tell, rapidly and memory-efficiently, whether an element is in the set or no.

If we can use hash tables where we have O(1) in best time, O(n) in a very bad situations. Why do we need a new abstract data structure that look-up for an element with less certainty.
What are the scenarios where HashTables fails and demands the use of bloom filters?

When should bloom filters be used over hashtables and vice-versa?

Solution

The second paragraph of the Wikipedia article on Bloom filters says the following, with a citation to Bloom's original 1970 paper.


Bloom proposed the technique for applications where the amount of source data would require an impracticably large hash area in memory if "conventional" error-free hashing techniques were applied. He gave the example of a hyphenation algorithm for a dictionary of 500,000 words, out of which 90% follow simple hyphenation rules, but the remaining 10% require expensive disk accesses to retrieve specific hyphenation patterns. With sufficient core memory, an error-free hash could be used to eliminate all unnecessary disk accesses; on the other hand, with limited core memory, Bloom's technique uses a smaller hash area but still eliminates most unnecessary accesses. For example, a hash area only 15% of the size needed by an ideal error-free hash still eliminates 85% of the disk accesses.

Burton H. Bloom, Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM 13(7): 422–426, 1970. (DOI)

Context

StackExchange Computer Science Q#32264, answer score: 10

Revisions (0)

No revisions yet.