HiveBrain v1.2.0
Get Started
← Back to all entries
patternswiftModerate

Extending CGPoint to conform to Hashable

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
conformhashableextendingcgpoint

Problem

For Advent of Code Day 1, I found myself needing to use a tuple as a key in a dictionary. Seeing that I can't extend tuples in Swift, I decided to extend CGPoint to conform to Hashable:

extension CGPoint: Hashable {
    public var hashValue: Int {
        return hash()
    }

    func hash() -> Int {
        var hash = 23
        hash = hash &* 31 &+ Int(self.x)
        return hash &* 31 &+ Int(self.y)
    }
}


This is based off of an implementation found on Stack Overflow. Another viable option might be self.x.hashValue << sizeof(CGFloat) ^ self.y.hashValue (from here).

This works if the x and y values of the CGPoint are whole numbers, but the Int truncation gives a lossy conversion from CGFloat to Int which could lead to hash collisions.

How can I improve this implementation, and get floating point numbers to play nicely with a hash value that is an Int?

Or, would the best approach not even be to extend CGPoint at all and to just use a custom Swift container struct or an NSObject subclass?

Solution

I can see no advantage of computing the hash value from Int(self.x) and Int(self.y). As you already noticed, truncating the floating point
numbers to integers loses information and therefore causes
hash collisions.

CGFloat is (like all Swift number point types) Hashable, and its
hashValue is just the integer with the same memory representation
(as one can see from the implementation). So all possible CGFloat values have different hash values, which makes x.hashValue, y.hashValue a much better basis for computing a hash value for CGPoint than
Int(self.x) and Int(self.y).

It remains the question how to compute the hash value of the point
from the hash values of its coordinates. Of course there can be no
"best hash" which works for all data sets. But in order to get
at least an idea which might be better or worse, I used the
following simple test:

var hv = Set()
var count = 0
for i in -200 ..< 200 {
    for j in -200 ..< 200 {
        count += 1
        let p = CGPoint(x: CGFloat(i)/20, y: CGFloat(j)/20)
        hv.insert(p.hashValue)
    }
}

print(count, hv.count)


It computes the hash values of 160,000 points where both
x and y range from -20 to 19.9 in steps of 0.1, and counts
how many different hash values we get.

The referenced hash function from here,
translated to Swift 3, is

public var hashValue: Int {
    return (self.x.hashValue .size) ^ self.y.hashValue
}
// # of points: 160000 hash values:  79976


Your hash function is definitely worse (but we already know why):

public var hashValue: Int {
    var hash = 23
    hash = hash &* 31 &+ Int(self.x)
    return hash &* 31 &+ Int(self.y)
}
// # of points: 160000 hash values:  1249


Using hashValue instead of Int() improves it considerably:

public var hashValue: Int {
    var hash = 23
    hash = hash &* 31 &+ self.x.hashValue
    return hash &* 31 &+ self.y.hashValue
}
// # of points: 160000 hash values:  68827


But this is still not really satisfying.

The Boost library has a hash_combine()
function for exactly this purpose, and its implementation
has one generic variant

template 
inline void hash_combine_impl(SizeT& seed, SizeT value)
{
    seed ^= value + 0x9e3779b9 + (seed>2);
}


as well as specializations for uint32_t and uint64_t.
Translated to Swift this would be:

func hash_combine(seed: inout UInt, value: UInt) {
    let tmp = value &+ 0x9e3779b9 &+ (seed > 2)
    seed ^= tmp
}


It is crucial here to use unsigned integers, otherwise
the right shift seed >> 2 would preserve the sign bit.

The Swift hashValue is a signed integer, however, therefore
we have to convert accordingly:

public var hashValue: Int {
    var seed = UInt(0)
    hash_combine(seed: &seed, value: UInt(bitPattern: x.hashValue))
    hash_combine(seed: &seed, value: UInt(bitPattern: y.hashValue))
    return Int(bitPattern: seed)
}
// # of points: 160000 hash values:  159958


As one can see, this hash function has only 42 collisions, and is
far better than the previous ones (for this data set). It is also relatively simple to compute, and can easily be adapted
for types with more properties.

If that is not good enough then you can try other "hash combine" methods,
for example the the 32-bit or
64-bit specializations from the Boost library (depending on the
size of Int on your platform) and check if that works better.
Another one is
here,
and there are probably much more.

Finally you asked:

Or, would the best approach not even be to extend CGPoint at all and to just use a custom Swift container struct or an NSObject subclass?

It depends. If the tuple represents a "point in space" then using (and
extending) CGPoint seems fine to me. If the tuple represents something
else (and you just chose CGPoint because it happens to have two
properties) then I would prefer to define a custom struct which
clearly shows its purpose. A subclass of NSObject would be a
reference type and should only be chosen if you need reference
semantics. Otherwise value types are preferable.

Update: As of Swift 4.1, the compiler can synthesize the
conformance to Equatable/Hashable if all of its members are Equatable/Hashable, see SE-0185 Synthesizing Equatable and Hashable conformance.

Example:

struct Point : Hashable {
    let x: CGFloat
    let y: CGFloat
}


The actual hash function is an implementation detail, but in my test
it worked excellently: The above test code produced no collisions at all
for the 160,000 points.

Code Snippets

var hv = Set<Int>()
var count = 0
for i in -200 ..< 200 {
    for j in -200 ..< 200 {
        count += 1
        let p = CGPoint(x: CGFloat(i)/20, y: CGFloat(j)/20)
        hv.insert(p.hashValue)
    }
}

print(count, hv.count)
public var hashValue: Int {
    return (self.x.hashValue << MemoryLayout<CGFloat>.size) ^ self.y.hashValue
}
// # of points: 160000 hash values:  79976
public var hashValue: Int {
    var hash = 23
    hash = hash &* 31 &+ Int(self.x)
    return hash &* 31 &+ Int(self.y)
}
// # of points: 160000 hash values:  1249
public var hashValue: Int {
    var hash = 23
    hash = hash &* 31 &+ self.x.hashValue
    return hash &* 31 &+ self.y.hashValue
}
// # of points: 160000 hash values:  68827
template <typename SizeT>
inline void hash_combine_impl(SizeT& seed, SizeT value)
{
    seed ^= value + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

Context

StackExchange Code Review Q#148763, answer score: 14

Revisions (0)

No revisions yet.