patternMinor
Hash Table: Relation between position of a value and Hash table size
Viewed 0 times
sizepositionhashvaluebetweenandrelationtable
Problem
I need to know whether hash table size, $S$ has any impact on distribution of a value in hash table.
At the moment I'm using SHA512 and I can tolerate up to 50 elements in any bucket, and I have $10^6$ elements; I set hash table length to $S=50000$. This is far more larger than what is suggested in the literature [1]. However, if I reduce the table length I would get overflow (as I have tested and saw it).
To find position of an element in hash table I do $hash(val) mod S$.
Question: Does $S$ play any role in distribution of the value in hash table?
In other words, do I need to set $S$ as a prime number or power of two ?
[1]. http://www14.in.tum.de/personen/raab/publ/balls.pdf
At the moment I'm using SHA512 and I can tolerate up to 50 elements in any bucket, and I have $10^6$ elements; I set hash table length to $S=50000$. This is far more larger than what is suggested in the literature [1]. However, if I reduce the table length I would get overflow (as I have tested and saw it).
To find position of an element in hash table I do $hash(val) mod S$.
Question: Does $S$ play any role in distribution of the value in hash table?
In other words, do I need to set $S$ as a prime number or power of two ?
[1]. http://www14.in.tum.de/personen/raab/publ/balls.pdf
Solution
No. If you use SHA512, you can use any modulus $S$ you want: it doesn't need to be prime. A power of two is fine and is no better or worse than a prime.
The reason is that the cryptographic properties of SHA512 ensure that the output of SHA512 is (for all practical purposes) essentially a uniformly random 512-bit string. Because $2^{512}$ is so much larger than your modulus, after reducing modulo $S$, the result will be approximately uniform. Any non-uniformity will be incredibly, exponentially small: so small that it's not detectable within your lifetime, or the lifetime of the solar system. (I know there's another answer that claims non-uniformity is a problem, but that answer failed to quantify the extent of the non-uniformity and thus came to the wrong conclusion; the amount of non-uniformity is so exponentially small that it's completely irrelevant in practice).
So, go ahead and use any modulus that's convenient. It doesn't need to be a power of two or a prime number, if you're using SHA512 as your hash function.
The reason is that the cryptographic properties of SHA512 ensure that the output of SHA512 is (for all practical purposes) essentially a uniformly random 512-bit string. Because $2^{512}$ is so much larger than your modulus, after reducing modulo $S$, the result will be approximately uniform. Any non-uniformity will be incredibly, exponentially small: so small that it's not detectable within your lifetime, or the lifetime of the solar system. (I know there's another answer that claims non-uniformity is a problem, but that answer failed to quantify the extent of the non-uniformity and thus came to the wrong conclusion; the amount of non-uniformity is so exponentially small that it's completely irrelevant in practice).
So, go ahead and use any modulus that's convenient. It doesn't need to be a power of two or a prime number, if you're using SHA512 as your hash function.
Context
StackExchange Computer Science Q#49640, answer score: 3
Revisions (0)
No revisions yet.