patternjavaMinor
A better hash function for an Equivalence?
Viewed 0 times
functionhashbetterforequivalence
Problem
The following is an implementation of Guava's Equivalence for Jackson's
So, instead of extending
```
public final class JsonNodeEquivalence
extends Equivalence
{
// snip
@Override
protected boolean doEquivalent(final JsonNode a, final JsonNode b)
{
/*
* If both are numbers, delegate to appropriate method
*/
if (a.isNumber() && b.isNumber())
return numEquals(a, b);
final NodeType typeA = NodeType.getNodeType(a);
final NodeType typeB = NodeType.getNodeType(b);
/*
* If they are of different types, no dice
*/
if (typeA != typeB)
return false;
/*
* For all other primitive types than numbers, trust JsonNode
*/
if (!a.isContainerNode())
return a.equals(b);
/*
* OK, so they are containers (either both arrays or objects due to the
* test on types above). They are obviously not equals if they do not
* have the same number of elements/members.
*/
if (a.size() != b.size())
return false;
/*
* Delegate to the appropriate method according to their type.
*/
return typeA == NodeType.ARRAY ? arrayEquals(a, b) : objectEquals(a, b);
}
@Override
protected int doHash(final JsonNode t)
{
/*
* If this is a numeric node, we want the same hashcode for the same
* mathematical values. Go with double, its range is good enough for
* 99+% of use cases.
*/
if (t.isNumber())
return Double.valueOf(t.doubleValue()).hashCode();
/*
* If this is a primitive type (other than numbers, handled above
JsonNode for the needs of JSON Schema: in certain situations, all JSON numbers need to be compared mathematically (ie, 1.0 is equal to 1) but JsonNode (accurately) considers them non equal.So, instead of extending
JsonNode, I use this code, which works:```
public final class JsonNodeEquivalence
extends Equivalence
{
// snip
@Override
protected boolean doEquivalent(final JsonNode a, final JsonNode b)
{
/*
* If both are numbers, delegate to appropriate method
*/
if (a.isNumber() && b.isNumber())
return numEquals(a, b);
final NodeType typeA = NodeType.getNodeType(a);
final NodeType typeB = NodeType.getNodeType(b);
/*
* If they are of different types, no dice
*/
if (typeA != typeB)
return false;
/*
* For all other primitive types than numbers, trust JsonNode
*/
if (!a.isContainerNode())
return a.equals(b);
/*
* OK, so they are containers (either both arrays or objects due to the
* test on types above). They are obviously not equals if they do not
* have the same number of elements/members.
*/
if (a.size() != b.size())
return false;
/*
* Delegate to the appropriate method according to their type.
*/
return typeA == NodeType.ARRAY ? arrayEquals(a, b) : objectEquals(a, b);
}
@Override
protected int doHash(final JsonNode t)
{
/*
* If this is a numeric node, we want the same hashcode for the same
* mathematical values. Go with double, its range is good enough for
* 99+% of use cases.
*/
if (t.isNumber())
return Double.valueOf(t.doubleValue()).hashCode();
/*
* If this is a primitive type (other than numbers, handled above
Solution
Working through your code i Have found very few things to criticise....
If I have one complaint about your 'style', it is that your comments are too verbose...
As for your question about the hashCode of Collection-like data members....
I have inspected the source code for both ObjectNode and ArrayNode.
In each case they implement reasonable hashes given some constraints.
You need to answer three questions:
If you feel like you should implement your own hash, then consider the following:
Messing with bitwise functions vs.
... to clarify what I mean here, I have one specific example in mind:
consider the difference between
Edit: Discussion about prime-number shifts:
The above code results in:
What this means, is that by shifting with relatively large primes, it takes a while for the same bit-pattern to repeat, and also, the coverage of the entire bit range is comprehensive. This is like the Cicada 17-year cycles and 13-year cycles, etc. making sure that no two species of cicada are (often) emerging in the same years.
It means that the each bit in each input hash is used to affect every other bit in subsequent hash values as much as possible (and it affects itself as little as possible because an XOR with yourself is always 0).
- I like the way you are using
final.
- I like the general layout and structure
- using compareTo on the BigDecimals is the right thing to do.
If I have one complaint about your 'style', it is that your comments are too verbose...
As for your question about the hashCode of Collection-like data members....
I have inspected the source code for both ObjectNode and ArrayNode.
In each case they implement reasonable hashes given some constraints.
You need to answer three questions:
- do you know these are static data members (you are not changing their internal values)?
- The arrays may contain numbers just so long as the result for the array is consistent?
- do you trust the array members to have reasonably-well distributed hash values?
If you feel like you should implement your own hash, then consider the following:
// start the hash off with a value that's unique to the size.
int ret = t.size();
/*
* If the container is empty, just return
*/
if (ret == 0)
return 1;
/*
* Array
*/
if (t.isArray()) {
// prime-number tricks like *31 are to ensure distributions.
// we don't really care too much.... but we can shift easily...
// 13 and 19 are prime numbers that add to 32 -- making a relatively
// long loop-around time.
// rotate the hash 19 bits, and XOR with the next element.
for (final JsonNode element : t)
ret = ((ret >>> 13) | (ret > iterator = t.fields();
while (iterator.hasNext()) {
final Map.Entry entry = iterator.next();
ret ^= entry.getKey().hashCode();
ret = ((ret >>> 13) | (ret << 19)) ^ doHash(entry.getValue()));
}
return ret;Messing with bitwise functions vs.
*31 is a JVM-dependant fix. I have had successes and failures with it....... to clarify what I mean here, I have one specific example in mind:
consider the difference between
(ret * 31) ^ hash and ((ret << 5) - ret) ^ hash. I have known the second one to be both significantly faster, and significantly slower.....Edit: Discussion about prime-number shifts:
public static void main(String[] args) {
// sorted array of 1-bit integer values
int[] bitsets = new int[32];
for (int i = 0; i >> 13) | (val << 19);
int pos = Arrays.binarySearch(bitsets, val);
cnt[pos]++;
}
System.out.println("Bit Distributions: " + Arrays.toString(cnt));
}The above code results in:
Bit Distributions: [32, 31, 31, 31, 31, 31, 32, 32, 31, 31, 31, 31, 31, 32, 31, 31, 31, 31, 31, 32, 32, 31, 31, 31, 31, 32, 32, 31, 31, 31, 31, 31]What this means, is that by shifting with relatively large primes, it takes a while for the same bit-pattern to repeat, and also, the coverage of the entire bit range is comprehensive. This is like the Cicada 17-year cycles and 13-year cycles, etc. making sure that no two species of cicada are (often) emerging in the same years.
It means that the each bit in each input hash is used to affect every other bit in subsequent hash values as much as possible (and it affects itself as little as possible because an XOR with yourself is always 0).
Code Snippets
// start the hash off with a value that's unique to the size.
int ret = t.size();
/*
* If the container is empty, just return
*/
if (ret == 0)
return 1;
/*
* Array
*/
if (t.isArray()) {
// prime-number tricks like *31 are to ensure distributions.
// we don't really care too much.... but we can shift easily...
// 13 and 19 are prime numbers that add to 32 -- making a relatively
// long loop-around time.
// rotate the hash 19 bits, and XOR with the next element.
for (final JsonNode element : t)
ret = ((ret >>> 13) | (ret << 19)) ^ doHash(element);
return ret;
}
/*
* Not an array? An object.
*/
final Iterator<Map.Entry<String, JsonNode>> iterator = t.fields();
while (iterator.hasNext()) {
final Map.Entry<String, JsonNode> entry = iterator.next();
ret ^= entry.getKey().hashCode();
ret = ((ret >>> 13) | (ret << 19)) ^ doHash(entry.getValue()));
}
return ret;public static void main(String[] args) {
// sorted array of 1-bit integer values
int[] bitsets = new int[32];
for (int i = 0; i < bitsets.length; i++) {
bitsets[(i + 1) % 32] = 1 << i;
}
int[] cnt = new int[32];
int val = 1;
for (int i = 0; i < 1000; i++) {
val = (val >>> 13) | (val << 19);
int pos = Arrays.binarySearch(bitsets, val);
cnt[pos]++;
}
System.out.println("Bit Distributions: " + Arrays.toString(cnt));
}Bit Distributions: [32, 31, 31, 31, 31, 31, 32, 32, 31, 31, 31, 31, 31, 32, 31, 31, 31, 31, 31, 32, 32, 31, 31, 31, 31, 32, 32, 31, 31, 31, 31, 31]Context
StackExchange Code Review Q#20970, answer score: 6
Revisions (0)
No revisions yet.