patternMinor
Building static hash table with particular collisions
Viewed 0 times
withbuildinghashcollisionsparticulartablestatic
Problem
Is there efficient algorithm to encode keys in hash function with provided collisions?
By efficient I mean with low-ish runtime of lookup operation (taking constants into account) and realistic time of finding such function.
Keys are floating point values - I can compare them for equality as they are raw, unprocessed. Equality holds, and there are no round-offs.
If there is better chance for algorithm on integers - I am still interested.
I did not included specific scheme for building perfect hash - it might be the wrong way or different scheme supports such change.
Let $h(x)$ be hash function (the one to be found)
$X$ be set of sets of float values that $\forall x \in X_i, h(x) = c_i$
$\forall X_i, |X_i| \in [2, 20]$
Let $Y$ be set of float values that there are no collisions (it does not collide with any $X_i$)
For example, I have set $X = \{\{1, 5.5, 24\}, \{7, 20\}\}$ and the set $Y = \{4, 25, 31.2, ...\}$
$h(1) = h(5.5) = h(24) = c_0$
$h(7) = h(20) = c_1$
$h(4) = c_2$
$h(25) = c_3$
$h(31.2) = c_4$
$c_i = c_j \iff i = j$
$X$ set containts colliding tuples, $Y$ contains values that collide with nothing.
$Y$ is about 85-95% of input, $X$ occupies the rest.
If for example $X$ has values in tuples of multiplicities {3, 5, 2, 2, 20, 14, 3}, this are 49 values, but 7 distinct buckets and Y has 451 elements, these are 451 distinct buckets.
In the hashtable there will be at least 458 buckets (this is perfect, and preffered), but might be a bit bigger up to 1%.
All possible inputs are given, $h(x)$ will never execute for value that was not in the input.
Order of the values from input does not matter, $h(x)$ does not need to preserve order, does not need to be monotone.
TLDR; Below are previous attempts, some background etc.
Background
In the initial phase I gather data from the device, this phase ends when for given settings I have collected all possible outputs.
This data is collected as floating point numbers. I calculate outputs based on what
By efficient I mean with low-ish runtime of lookup operation (taking constants into account) and realistic time of finding such function.
Keys are floating point values - I can compare them for equality as they are raw, unprocessed. Equality holds, and there are no round-offs.
If there is better chance for algorithm on integers - I am still interested.
I did not included specific scheme for building perfect hash - it might be the wrong way or different scheme supports such change.
Let $h(x)$ be hash function (the one to be found)
$X$ be set of sets of float values that $\forall x \in X_i, h(x) = c_i$
$\forall X_i, |X_i| \in [2, 20]$
Let $Y$ be set of float values that there are no collisions (it does not collide with any $X_i$)
For example, I have set $X = \{\{1, 5.5, 24\}, \{7, 20\}\}$ and the set $Y = \{4, 25, 31.2, ...\}$
$h(1) = h(5.5) = h(24) = c_0$
$h(7) = h(20) = c_1$
$h(4) = c_2$
$h(25) = c_3$
$h(31.2) = c_4$
$c_i = c_j \iff i = j$
$X$ set containts colliding tuples, $Y$ contains values that collide with nothing.
$Y$ is about 85-95% of input, $X$ occupies the rest.
If for example $X$ has values in tuples of multiplicities {3, 5, 2, 2, 20, 14, 3}, this are 49 values, but 7 distinct buckets and Y has 451 elements, these are 451 distinct buckets.
In the hashtable there will be at least 458 buckets (this is perfect, and preffered), but might be a bit bigger up to 1%.
All possible inputs are given, $h(x)$ will never execute for value that was not in the input.
Order of the values from input does not matter, $h(x)$ does not need to preserve order, does not need to be monotone.
TLDR; Below are previous attempts, some background etc.
Background
In the initial phase I gather data from the device, this phase ends when for given settings I have collected all possible outputs.
This data is collected as floating point numbers. I calculate outputs based on what
Solution
The easiest way is to construct a static hash table $T$ containing all the collisions, in the following form: for each set of keys $S$ which are supposed to map to the same value, single out some $x \in S$, and put all other $y \in S$ in the table with an entry stating "$x$".
Now take a good hash function $h$, and construct a new one as follows:
If $h$ were a uniformly random function, then this would result in a uniformly random function conditioned on your collisions.
There are some optimizations possible here:
Now take a good hash function $h$, and construct a new one as follows:
- On input $x$, check whether $x \in T$.
- If $x \notin T$, return $h(x)$.
- If $x \in T$, return $h(T[x])$.
If $h$ were a uniformly random function, then this would result in a uniformly random function conditioned on your collisions.
There are some optimizations possible here:
- The hash function used to access $T$ can be weak.
- The table $T$ might be too large to fit in the cache, so accessing $T$ might be slow. We can add a Bloom filter that does fit in cache to ameliorate this.
Context
StackExchange Computer Science Q#53094, answer score: 7
Revisions (0)
No revisions yet.