patternMinor

Function that spreads input

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

spreadsinputfunctionthat

Problem

I'd like to know if there is a function $f$ from n-bit numbers to n-bit numbers that has the following characteristics:

$f$ should be bijective

Both $f$ and $f^{-1}$ should be calculable pretty fast

$f$ should return a number that has no significant correlation to its input.

The rationale is this:

I want to write a program that operates on data. Some information of the data is stored in a binary search tree where the search key is a symbol of an alphabet. With time, I add further symbols to the alphabet. New symbols simply get the next free number available. Hence, the tree will always have a small bias to smaller keys which causes more rebalancing than I think should be needed.

My idea is to mangle the symbol numbers with $f$ such that they are widely spread over the whole range of $[0,2^{64}-1]$. Since the symbol numbers only matter during input and output which happens only once, applying such a function should not be too expensive.

I thought about one iteration of the Xorshift random number generator, but I don't really know a way to undo it, although it should theoretically be possible.

Does anybody know such a function?

Is this a good idea?

Solution

You can use Fibonacci hashing, namely

$\qquad h_F(k) = k \cdot \frac{\sqrt{5} - 1}{2} - \left\lfloor k \cdot \frac{\sqrt{5} - 1}{2} \right\rfloor$.

For $k=1,\dots,n$ you get $n$ pairwise-distinct numbers (about) evenly spread in $[0,1]$. By scaling to $[1..M]$ and rounding (down), you get about evenly spread numbers in that interval.

For example, these are $h_F(1), \dots, h_F(200)$ scaled to $[0..10000]$ (left original sequence, right sorted):

This is an instance of what Knuth calls multiplicative hashing. For $w$ the computer's word size, $A$ some integer relatively prime to $w$ and $M$ the number of addresses needed, we use

$\qquad h(k) = \left\lfloor M \left( \bigl( k \cdot \frac{A}{w}\bigr) \mod 1 \right) \right\rfloor$

as hashing function. The above follows with $A/w = \phi^{-1} = \frac{\sqrt{5}-1}{2}$ (make sure you can compute it with a sufficient precision). While this also works with any other irrational number besides $\phi^{-1}$, it is one of only two numbers that lead to the "most uniformly distributed" numbers.

Find more in The Art of Computer Programming, Volume 3 by Donald Knuth (chapter 6.4 from page 513 in the second edition). In particular you'll find why the resulting numbers are pairwise distinct (at least if $n \ll M$) and how to compute the inverse function if you use natural $A$ and $w$ instead of $\phi^{-1}$.

Context

StackExchange Computer Science Q#6668, answer score: 6

Revisions (0)

No revisions yet.