HiveBrain v1.2.0
Get Started
← Back to all entries
gotchaMinor

Why does the following function distribute things in a binomial distribution?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
distributewhythethingsfunctionbinomialfollowingdoesdistribution

Problem

I was reading the following FDS paper:

https://www.usenix.org/system/files/conference/osdi12/osdi12-final-75.pdf

and it says that the the following hash function does not distribute things uniformly (instead it creates a binomial distribution):

$$i = (hash(g + t)) \pmod n$$

while the following did distribute things uniformly:

$$i = (hash(g) + t) \pmod n$$

Why does the above distribute things evenly while the other one doesn't?

g = is the global UID for a blob

t = tract number. (tract is the measure/units of reads and writes to a blob).

n = is the number of tract servers or a multiple of the tract servers.

hash = SHA-1 (according to the paper)

Paper Reference:

Title: Flat Datacenter Storage

Author(s): Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen Hofmann, Jon Howell, Yutaka Suzue

Institution(s): Microsoft Research, Univeristy of Texas at Austin

Solution

The second function is easier to explain: it sends tract $t$ to position $t + h \pmod{n}$, where $h = hash(g)$. The function $t \mapsto t + h \pmod{n}$ is injective (one-to-one), and so every index $i$ gets mapped exactly once (in fact, by $i - h \pmod{n}$).

For the first function, we can think of $t \mapsto hash(g+t)$ as a random function (this is a common assumption whenever hash functions are involved; the fact that we use SHA-1 is completely irrelevant). Therefore for each fixed index $i$, the probability that $t$ is mapped to $i$ is $1/n$, and the total number of $t$s mapped to $i$ is distributed $\mathrm{Bin}(1/n,1) \approx \mathrm{Po}(1)$, that is it has binomial distribution which is approximated well by a Poisson distribution with mean $1$.

Context

StackExchange Computer Science Q#23153, answer score: 4

Revisions (0)

No revisions yet.