patternMinor
Hash size: Are prime numbers "near" powers of two a poor choice for the modulus?
Viewed 0 times
thearesizenumbershashpoortwochoiceforprime
Problem
Cormen et al.'s "Introduction to Algorithms" says the following about the division method hash function $h(k)=k \text{ mod } m$:
A prime not too close to an exact power of 2 is often a good choice for $m$. For example, suppose we wish to allocate a hash table, with collisions resolved by chaining, to hold roughly $n=2000$ character strings, where a character has 8 bits. We don't mind examining an average of 3 elements in an unsuccesful search, so we allocate a table of size $m=701$. The number $701$ is chose because it is a prime near $2000/3$ but not near any power of 2.
The book, however, does not explain what it means by "near". Is $701$ preferrable over, say, $661$ because $661-512<701-512$? I do not understand how this is relevant to the modulo function involved in the calculation of $h(k)$.
A prime not too close to an exact power of 2 is often a good choice for $m$. For example, suppose we wish to allocate a hash table, with collisions resolved by chaining, to hold roughly $n=2000$ character strings, where a character has 8 bits. We don't mind examining an average of 3 elements in an unsuccesful search, so we allocate a table of size $m=701$. The number $701$ is chose because it is a prime near $2000/3$ but not near any power of 2.
The book, however, does not explain what it means by "near". Is $701$ preferrable over, say, $661$ because $661-512<701-512$? I do not understand how this is relevant to the modulo function involved in the calculation of $h(k)$.
Solution
If a string of 8 bit bytes is interpreted as a large integer, and you calculate the value modulo $2^{24}-1$ then the first, fourth, seventh byte etc. are all hashed to the same value, which means the hash will not be well distributed.
Context
StackExchange Computer Science Q#86237, answer score: 2
Revisions (0)
No revisions yet.