HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Fast hash function for set equality

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
fastequalityfunctionhashforset

Problem

I'm searching an hash function for integer set equality that must be fast.
It must support update (adding an element already in the set must not change the hash) and union.
MinHash has these 2 properties but it is too expensive and I do not need all the feature of MinHash, in fact I don't need set similarity but only equality.
Do you know something that can satisfy my needs?
Thanks :)

P.S. I have to implement it in C for an high performance software

Solution

MinHash already is super-fast. I suggest you check whether you understand correctly how MinHash works.

Take any standard hash function $H$; then all you have to do is compute $H$ once on each element of the set. You can implement $H$ with as fast a hash algorithm as you want. The update operation requires one call to $H$ (namely, if your set's hash is currently $y$, and you add element $x$, the new hash is $\min(y, H(x))$), and the union operation is super fast (namely, the union of two sets with hashes $y_1,y_2$ is $\min(y_1,y_2)$). Notice that these operation have the property you want (adding an element already in the set does not change the hash).

I have a hard time imagining how you could get much faster than that.

If you want this to be faster, a reasonable way is to choose a $H$ that is faster.

I've described a variant where the hash of a set is the smallest of the element-hashes. The only downside is that the number of hash collisions might higher than you like. Measure and see if that is actually a problem in practice. If it is, you can consider variants that keep the $k$ distinct smallest element hashes. If $k$ is large, you can store those $k$ hashes in a heap of size $k$ to make the update and union operations faster.

Context

StackExchange Computer Science Q#109361, answer score: 3

Revisions (0)

No revisions yet.