patternMinor
For what kind of data are hash table operations O(1)?
Viewed 0 times
operationswhatarehashkindfordatatable
Problem
From the answers to (When) is hash table lookup O(1)?, I gather that hash tables have $O(1)$ worst-case behavior, at least amortized, when the data satisfies certain statistical conditions, and there are techniques to help make these conditions broad.
However, from a programmer's perspective, I don't know in advance what my data will be: it often comes from some external source. And I rarely have all the data at once: often insertions and deletions happen at a rate that's not far below the rate of lookups, so preprocessing the data to fine-tune the hash function is out.
So, taking a step out: given some knowledge about data source, how can I determine whether a hash table has a chance of having $O(1)$ operations, and possibly which techniques to use on my hash function?
However, from a programmer's perspective, I don't know in advance what my data will be: it often comes from some external source. And I rarely have all the data at once: often insertions and deletions happen at a rate that's not far below the rate of lookups, so preprocessing the data to fine-tune the hash function is out.
So, taking a step out: given some knowledge about data source, how can I determine whether a hash table has a chance of having $O(1)$ operations, and possibly which techniques to use on my hash function?
Solution
Hash table lookup can always be $O(1)$ for static sets, see the 2002 paper by Arne Andersson and Mikkel Thorup: Dynamic ordered sets with exponential search trees
Firstly, we give the first deterministic polynomial-time (in n)
algorithm for constructing a linear space static dictionary with $O(1)$
worst-case access cost (cf. perfect hashing). As mentioned earlier, a
linear space data structure that supports member queries (neighbor
queries are not supported) in constant time can be constructed at a
worst-case cost $O (n^2 W)$ without division [30]. We show that the
dependency of word size can be removed.
In the general case, Andersson et al provide an algorithm for hash-indexed data structures that support lookups and updates in $O(\sqrt{\log n / \log \log n})$. Furthermore, they prove that this bound is optimal. So we know exactly how close we can get to $O(1)$ in the general case.
Firstly, we give the first deterministic polynomial-time (in n)
algorithm for constructing a linear space static dictionary with $O(1)$
worst-case access cost (cf. perfect hashing). As mentioned earlier, a
linear space data structure that supports member queries (neighbor
queries are not supported) in constant time can be constructed at a
worst-case cost $O (n^2 W)$ without division [30]. We show that the
dependency of word size can be removed.
In the general case, Andersson et al provide an algorithm for hash-indexed data structures that support lookups and updates in $O(\sqrt{\log n / \log \log n})$. Furthermore, they prove that this bound is optimal. So we know exactly how close we can get to $O(1)$ in the general case.
Context
StackExchange Computer Science Q#477, answer score: 6
Revisions (0)
No revisions yet.