gotchaMinor
Why does this particular hashCode function help decrease collisions?
Viewed 0 times
thiswhyhashcodefunctionhelpdoesparticularcollisionsdecrease
Problem
I just read in Effective Java about the
-
Store some constant nonzero value, say, 17, in an int variable called result.
-
For each significant field f in your object (each field taken into account by the equals method, that is), do the following:
a. Compute an int hash code c for the field.
b. Combine the hash code c computed in step 2.a into result as
follows:
A nonzero initial value is used in step 1 so the hash value will be
affected by initial fields whose hash value, as computed in step 2.a,
is zero.
If zero were used as the initial value in step 1, the overall
hash value would be unaffected by any such initial fields, which could
increase collisions. The value 17 is arbitrary. The multiplication in
step 2.b makes the result depend on the order of the fields, yielding
a much better hash function if the class has multiple similar fields.
And that's their implementation of
My questions are below:
I tried to come up with an example of a collision using a class with two and three integer fields, but didn't succeed. I'd glad, if you gave me a simple example of a collision.
hashCode method:-
Store some constant nonzero value, say, 17, in an int variable called result.
-
For each significant field f in your object (each field taken into account by the equals method, that is), do the following:
a. Compute an int hash code c for the field.
b. Combine the hash code c computed in step 2.a into result as
follows:
result = 31 * result + c;A nonzero initial value is used in step 1 so the hash value will be
affected by initial fields whose hash value, as computed in step 2.a,
is zero.
If zero were used as the initial value in step 1, the overall
hash value would be unaffected by any such initial fields, which could
increase collisions. The value 17 is arbitrary. The multiplication in
step 2.b makes the result depend on the order of the fields, yielding
a much better hash function if the class has multiple similar fields.
And that's their implementation of
hashCode:@Override public int hashCode() {
result = 17;
result = 31 * result + areaCode;
result = 31 * result + prefix;
result = 31 * result + lineNumber;
hashCode = result;
return result;
}My questions are below:
- How does the initial value 17 help us decrease collisions?
- How does multiplying by 31 help us decrease collisions?
- Why do we use 17 and 31? I know they are prime numbers. Is that why we use them?
I tried to come up with an example of a collision using a class with two and three integer fields, but didn't succeed. I'd glad, if you gave me a simple example of a collision.
Solution
If the hashed codes are $x_1,\ldots,x_n$ (in that order), then the resulting hash value is
$$
x_n + 31 x_{n-1} + 31^2 x_{n-2} + \cdots + 31^{n-1} x_1 + 31^n \cdot 17 \pmod{2^{32}},
$$
assuming
-
17 should be non-zero. This way, the hash depends on the input length. Otherwise it is completely arbitrary.
-
31 should be relatively prime to $2^{32}$ (i.e., odd) and have a high order modulo $2^{32}$ (that is, the minimal $m$ such that $31^m \equiv 1 \pmod{2^{32}}$); this way the multiplicands $31,31^2,\ldots$ are all different for the longest time possible. It turns out that the order of $31$ modulo $2^{32}$ is only $2^{27}$, so 31 is not an optimal choice. Its being prime makes no difference – we only care about its oddness and order.
In any case, this hash can be improved on – for example, the
$$
x_n + 31 x_{n-1} + 31^2 x_{n-2} + \cdots + 31^{n-1} x_1 + 31^n \cdot 17 \pmod{2^{32}},
$$
assuming
int is 32 bits, and ignoring signed/unsigned distinctions. In view of this, the integers 17 and 31 should satisfy the following two properties:-
17 should be non-zero. This way, the hash depends on the input length. Otherwise it is completely arbitrary.
-
31 should be relatively prime to $2^{32}$ (i.e., odd) and have a high order modulo $2^{32}$ (that is, the minimal $m$ such that $31^m \equiv 1 \pmod{2^{32}}$); this way the multiplicands $31,31^2,\ldots$ are all different for the longest time possible. It turns out that the order of $31$ modulo $2^{32}$ is only $2^{27}$, so 31 is not an optimal choice. Its being prime makes no difference – we only care about its oddness and order.
In any case, this hash can be improved on – for example, the
python hash function uses XOR instead of addition, thus making the resulting calculation non-algebraic.Context
StackExchange Computer Science Q#45287, answer score: 5
Revisions (0)
No revisions yet.