patternswiftModerate
Extending CGPoint to conform to Hashable
Viewed 0 times
conformhashableextendingcgpoint
Problem
For Advent of Code Day 1, I found myself needing to use a tuple as a key in a dictionary. Seeing that I can't extend tuples in Swift, I decided to extend
This is based off of an implementation found on Stack Overflow. Another viable option might be
This works if the
How can I improve this implementation, and get floating point numbers to play nicely with a hash value that is an
Or, would the best approach not even be to extend
CGPoint to conform to Hashable:extension CGPoint: Hashable {
public var hashValue: Int {
return hash()
}
func hash() -> Int {
var hash = 23
hash = hash &* 31 &+ Int(self.x)
return hash &* 31 &+ Int(self.y)
}
}This is based off of an implementation found on Stack Overflow. Another viable option might be
self.x.hashValue << sizeof(CGFloat) ^ self.y.hashValue (from here).This works if the
x and y values of the CGPoint are whole numbers, but the Int truncation gives a lossy conversion from CGFloat to Int which could lead to hash collisions.How can I improve this implementation, and get floating point numbers to play nicely with a hash value that is an
Int?Or, would the best approach not even be to extend
CGPoint at all and to just use a custom Swift container struct or an NSObject subclass?Solution
I can see no advantage of computing the hash value from
numbers to integers loses information and therefore causes
hash collisions.
(as one can see from the implementation). So all possible
It remains the question how to compute the hash value of the point
from the hash values of its coordinates. Of course there can be no
"best hash" which works for all data sets. But in order to get
at least an idea which might be better or worse, I used the
following simple test:
It computes the hash values of 160,000 points where both
how many different hash values we get.
The referenced hash function from here,
translated to Swift 3, is
Your hash function is definitely worse (but we already know why):
Using
But this is still not really satisfying.
The Boost library has a hash_combine()
function for exactly this purpose, and its implementation
has one generic variant
as well as specializations for
Translated to Swift this would be:
It is crucial here to use unsigned integers, otherwise
the right shift
The Swift
we have to convert accordingly:
As one can see, this hash function has only 42 collisions, and is
far better than the previous ones (for this data set). It is also relatively simple to compute, and can easily be adapted
for types with more properties.
If that is not good enough then you can try other "hash combine" methods,
for example the the 32-bit or
64-bit specializations from the Boost library (depending on the
size of
Another one is
here,
and there are probably much more.
Finally you asked:
Or, would the best approach not even be to extend CGPoint at all and to just use a custom Swift container struct or an NSObject subclass?
It depends. If the tuple represents a "point in space" then using (and
extending)
else (and you just chose
properties) then I would prefer to define a custom
clearly shows its purpose. A subclass of
reference type and should only be chosen if you need reference
semantics. Otherwise value types are preferable.
Update: As of Swift 4.1, the compiler can synthesize the
conformance to Equatable/Hashable if all of its members are Equatable/Hashable, see SE-0185 Synthesizing Equatable and Hashable conformance.
Example:
The actual hash function is an implementation detail, but in my test
it worked excellently: The above test code produced no collisions at all
for the 160,000 points.
Int(self.x) and Int(self.y). As you already noticed, truncating the floating pointnumbers to integers loses information and therefore causes
hash collisions.
CGFloat is (like all Swift number point types) Hashable, and itshashValue is just the integer with the same memory representation(as one can see from the implementation). So all possible
CGFloat values have different hash values, which makes x.hashValue, y.hashValue a much better basis for computing a hash value for CGPoint thanInt(self.x) and Int(self.y).It remains the question how to compute the hash value of the point
from the hash values of its coordinates. Of course there can be no
"best hash" which works for all data sets. But in order to get
at least an idea which might be better or worse, I used the
following simple test:
var hv = Set()
var count = 0
for i in -200 ..< 200 {
for j in -200 ..< 200 {
count += 1
let p = CGPoint(x: CGFloat(i)/20, y: CGFloat(j)/20)
hv.insert(p.hashValue)
}
}
print(count, hv.count)It computes the hash values of 160,000 points where both
x and y range from -20 to 19.9 in steps of 0.1, and countshow many different hash values we get.
The referenced hash function from here,
translated to Swift 3, is
public var hashValue: Int {
return (self.x.hashValue .size) ^ self.y.hashValue
}
// # of points: 160000 hash values: 79976Your hash function is definitely worse (but we already know why):
public var hashValue: Int {
var hash = 23
hash = hash &* 31 &+ Int(self.x)
return hash &* 31 &+ Int(self.y)
}
// # of points: 160000 hash values: 1249Using
hashValue instead of Int() improves it considerably:public var hashValue: Int {
var hash = 23
hash = hash &* 31 &+ self.x.hashValue
return hash &* 31 &+ self.y.hashValue
}
// # of points: 160000 hash values: 68827But this is still not really satisfying.
The Boost library has a hash_combine()
function for exactly this purpose, and its implementation
has one generic variant
template
inline void hash_combine_impl(SizeT& seed, SizeT value)
{
seed ^= value + 0x9e3779b9 + (seed>2);
}as well as specializations for
uint32_t and uint64_t.Translated to Swift this would be:
func hash_combine(seed: inout UInt, value: UInt) {
let tmp = value &+ 0x9e3779b9 &+ (seed > 2)
seed ^= tmp
}It is crucial here to use unsigned integers, otherwise
the right shift
seed >> 2 would preserve the sign bit.The Swift
hashValue is a signed integer, however, thereforewe have to convert accordingly:
public var hashValue: Int {
var seed = UInt(0)
hash_combine(seed: &seed, value: UInt(bitPattern: x.hashValue))
hash_combine(seed: &seed, value: UInt(bitPattern: y.hashValue))
return Int(bitPattern: seed)
}
// # of points: 160000 hash values: 159958As one can see, this hash function has only 42 collisions, and is
far better than the previous ones (for this data set). It is also relatively simple to compute, and can easily be adapted
for types with more properties.
If that is not good enough then you can try other "hash combine" methods,
for example the the 32-bit or
64-bit specializations from the Boost library (depending on the
size of
Int on your platform) and check if that works better.Another one is
here,
and there are probably much more.
Finally you asked:
Or, would the best approach not even be to extend CGPoint at all and to just use a custom Swift container struct or an NSObject subclass?
It depends. If the tuple represents a "point in space" then using (and
extending)
CGPoint seems fine to me. If the tuple represents somethingelse (and you just chose
CGPoint because it happens to have twoproperties) then I would prefer to define a custom
struct whichclearly shows its purpose. A subclass of
NSObject would be areference type and should only be chosen if you need reference
semantics. Otherwise value types are preferable.
Update: As of Swift 4.1, the compiler can synthesize the
conformance to Equatable/Hashable if all of its members are Equatable/Hashable, see SE-0185 Synthesizing Equatable and Hashable conformance.
Example:
struct Point : Hashable {
let x: CGFloat
let y: CGFloat
}The actual hash function is an implementation detail, but in my test
it worked excellently: The above test code produced no collisions at all
for the 160,000 points.
Code Snippets
var hv = Set<Int>()
var count = 0
for i in -200 ..< 200 {
for j in -200 ..< 200 {
count += 1
let p = CGPoint(x: CGFloat(i)/20, y: CGFloat(j)/20)
hv.insert(p.hashValue)
}
}
print(count, hv.count)public var hashValue: Int {
return (self.x.hashValue << MemoryLayout<CGFloat>.size) ^ self.y.hashValue
}
// # of points: 160000 hash values: 79976public var hashValue: Int {
var hash = 23
hash = hash &* 31 &+ Int(self.x)
return hash &* 31 &+ Int(self.y)
}
// # of points: 160000 hash values: 1249public var hashValue: Int {
var hash = 23
hash = hash &* 31 &+ self.x.hashValue
return hash &* 31 &+ self.y.hashValue
}
// # of points: 160000 hash values: 68827template <typename SizeT>
inline void hash_combine_impl(SizeT& seed, SizeT value)
{
seed ^= value + 0x9e3779b9 + (seed<<6) + (seed>>2);
}Context
StackExchange Code Review Q#148763, answer score: 14
Revisions (0)
No revisions yet.