patternModerate
Why do we need "Bloom Filters" if we can use hash tables?
Viewed 0 times
whycantablesneedbloomhashfiltersuse
Problem
A Bloom filter is a probabilistic data structure designed to tell, rapidly and memory-efficiently, whether an element is in the set or no.
If we can use hash tables where we have O(1) in best time, O(n) in a very bad situations. Why do we need a new abstract data structure that look-up for an element with less certainty.
What are the scenarios where HashTables fails and demands the use of bloom filters?
When should bloom filters be used over hashtables and vice-versa?
If we can use hash tables where we have O(1) in best time, O(n) in a very bad situations. Why do we need a new abstract data structure that look-up for an element with less certainty.
What are the scenarios where HashTables fails and demands the use of bloom filters?
When should bloom filters be used over hashtables and vice-versa?
Solution
The second paragraph of the Wikipedia article on Bloom filters says the following, with a citation to Bloom's original 1970 paper.
Bloom proposed the technique for applications where the amount of source data would require an impracticably large hash area in memory if "conventional" error-free hashing techniques were applied. He gave the example of a hyphenation algorithm for a dictionary of 500,000 words, out of which 90% follow simple hyphenation rules, but the remaining 10% require expensive disk accesses to retrieve specific hyphenation patterns. With sufficient core memory, an error-free hash could be used to eliminate all unnecessary disk accesses; on the other hand, with limited core memory, Bloom's technique uses a smaller hash area but still eliminates most unnecessary accesses. For example, a hash area only 15% of the size needed by an ideal error-free hash still eliminates 85% of the disk accesses.
Burton H. Bloom, Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM 13(7): 422–426, 1970. (DOI)
Bloom proposed the technique for applications where the amount of source data would require an impracticably large hash area in memory if "conventional" error-free hashing techniques were applied. He gave the example of a hyphenation algorithm for a dictionary of 500,000 words, out of which 90% follow simple hyphenation rules, but the remaining 10% require expensive disk accesses to retrieve specific hyphenation patterns. With sufficient core memory, an error-free hash could be used to eliminate all unnecessary disk accesses; on the other hand, with limited core memory, Bloom's technique uses a smaller hash area but still eliminates most unnecessary accesses. For example, a hash area only 15% of the size needed by an ideal error-free hash still eliminates 85% of the disk accesses.
Burton H. Bloom, Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM 13(7): 422–426, 1970. (DOI)
Context
StackExchange Computer Science Q#32264, answer score: 10
Revisions (0)
No revisions yet.