patternMinor
Function that spreads input
Viewed 0 times
spreadsinputfunctionthat
Problem
I'd like to know if there is a function $f$ from n-bit numbers to n-bit numbers that has the following characteristics:
The rationale is this:
I want to write a program that operates on data. Some information of the data is stored in a binary search tree where the search key is a symbol of an alphabet. With time, I add further symbols to the alphabet. New symbols simply get the next free number available. Hence, the tree will always have a small bias to smaller keys which causes more rebalancing than I think should be needed.
My idea is to mangle the symbol numbers with $f$ such that they are widely spread over the whole range of $[0,2^{64}-1]$. Since the symbol numbers only matter during input and output which happens only once, applying such a function should not be too expensive.
I thought about one iteration of the Xorshift random number generator, but I don't really know a way to undo it, although it should theoretically be possible.
Does anybody know such a function?
Is this a good idea?
- $f$ should be bijective
- Both $f$ and $f^{-1}$ should be calculable pretty fast
- $f$ should return a number that has no significant correlation to its input.
The rationale is this:
I want to write a program that operates on data. Some information of the data is stored in a binary search tree where the search key is a symbol of an alphabet. With time, I add further symbols to the alphabet. New symbols simply get the next free number available. Hence, the tree will always have a small bias to smaller keys which causes more rebalancing than I think should be needed.
My idea is to mangle the symbol numbers with $f$ such that they are widely spread over the whole range of $[0,2^{64}-1]$. Since the symbol numbers only matter during input and output which happens only once, applying such a function should not be too expensive.
I thought about one iteration of the Xorshift random number generator, but I don't really know a way to undo it, although it should theoretically be possible.
Does anybody know such a function?
Is this a good idea?
Solution
You can use Fibonacci hashing, namely
$\qquad h_F(k) = k \cdot \frac{\sqrt{5} - 1}{2} - \left\lfloor k \cdot \frac{\sqrt{5} - 1}{2} \right\rfloor$.
For $k=1,\dots,n$ you get $n$ pairwise-distinct numbers (about) evenly spread in $[0,1]$. By scaling to $[1..M]$ and rounding (down), you get about evenly spread numbers in that interval.
For example, these are $h_F(1), \dots, h_F(200)$ scaled to $[0..10000]$ (left original sequence, right sorted):
This is an instance of what Knuth calls multiplicative hashing. For $w$ the computer's word size, $A$ some integer relatively prime to $w$ and $M$ the number of addresses needed, we use
$\qquad h(k) = \left\lfloor M \left( \bigl( k \cdot \frac{A}{w}\bigr) \mod 1 \right) \right\rfloor$
as hashing function. The above follows with $A/w = \phi^{-1} = \frac{\sqrt{5}-1}{2}$ (make sure you can compute it with a sufficient precision). While this also works with any other irrational number besides $\phi^{-1}$, it is one of only two numbers that lead to the "most uniformly distributed" numbers.
Find more in The Art of Computer Programming, Volume 3 by Donald Knuth (chapter 6.4 from page 513 in the second edition). In particular you'll find why the resulting numbers are pairwise distinct (at least if $n \ll M$) and how to compute the inverse function if you use natural $A$ and $w$ instead of $\phi^{-1}$.
$\qquad h_F(k) = k \cdot \frac{\sqrt{5} - 1}{2} - \left\lfloor k \cdot \frac{\sqrt{5} - 1}{2} \right\rfloor$.
For $k=1,\dots,n$ you get $n$ pairwise-distinct numbers (about) evenly spread in $[0,1]$. By scaling to $[1..M]$ and rounding (down), you get about evenly spread numbers in that interval.
For example, these are $h_F(1), \dots, h_F(200)$ scaled to $[0..10000]$ (left original sequence, right sorted):
This is an instance of what Knuth calls multiplicative hashing. For $w$ the computer's word size, $A$ some integer relatively prime to $w$ and $M$ the number of addresses needed, we use
$\qquad h(k) = \left\lfloor M \left( \bigl( k \cdot \frac{A}{w}\bigr) \mod 1 \right) \right\rfloor$
as hashing function. The above follows with $A/w = \phi^{-1} = \frac{\sqrt{5}-1}{2}$ (make sure you can compute it with a sufficient precision). While this also works with any other irrational number besides $\phi^{-1}$, it is one of only two numbers that lead to the "most uniformly distributed" numbers.
Find more in The Art of Computer Programming, Volume 3 by Donald Knuth (chapter 6.4 from page 513 in the second edition). In particular you'll find why the resulting numbers are pairwise distinct (at least if $n \ll M$) and how to compute the inverse function if you use natural $A$ and $w$ instead of $\phi^{-1}$.
Context
StackExchange Computer Science Q#6668, answer score: 6
Revisions (0)
No revisions yet.