HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Is the following intuition valid for understanding $k$-wise independent hash functions?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
understandingtheindependentwisehashfollowingforvalidfunctionsintuition

Problem

I really enjoy reading up on algorithms and data structures and in the course of doing so often come across ones that rely on $k$-wise independent families of hash functions. I'm perfectly comfortable working with these families of hash functions as black boxes (that is, I understand the relevant definitions and how the algorithms and data structures rely on those definitions), but when I look at specific examples of families of $k$-wise independent families of hash functions, I often have a really hard time understanding why they work. I assume that a lot of this is due to the fact that I don't have a super solid grasp of number theory or finite fields.

One trend I've noticed is that to form a $k$-wise independent family of hash functions whose domain and codomain is a set $\{0, 1, 2, 3, \dots, p-1 \}$, where $p$ is a prime, you can choose a random $(k+1)$-degree polynomial by choosing $k+1$ random elements out of $\{0, 1, 2, \dots, p-1\}$, then to have the hash function be "evaluate the polynomial on the given input value." (I assume that this is equivalent to choosing a random $(k+1)$-degree polynomial over $\mathbb{F}_p$, though I may be mistaken about this.)

My intuition behind why this approach works is that, just like a degree-$(k+1)$ polynomial over the reals cannot be determined by $k$ or fewer points, a degree-$(k+1)$ polynomial over $\mathbb{F}_p$ cannot be determined by $k$ or fewer points. As a result, these polynomials work well as hash functions because, with random coefficients, if you see $k-1$ outputs of the hash function, it's impossible to predict any $k$th output of the hash function better than a random guess because you can't determine what the polynomial is.

My questions are the following:

  • Is this a correct intuition for why these families of hash functions work correctly?



  • I teach undergraduate courses on algorithms and data structures that do not have number theory or abstract algebra as prerequisites. If this intuition is correct,

Solution

Your intuition is exactly right. Yes, that's equivalent to choosing a random polynomial over $\mathbb{F}_p$. The reason why it works is exactly the interpolation theorem for finite fields.

$k$-wise independent basically means "it behaves like a perfect hash function, if you only feed it $k$ inputs" or "it behaves like a perfect hash function, as far as correlations between up to $k$ different hash values appear"; and the interpolation theorem then ensures that this level of perfection is achieved for the construction you outline.

It can be a bit tricky to explain this intuition to students who aren't familiar with anything from number theory, group theory, or finite fields. One approach is to explain the intuition of the interpolation theorem over the real numbers, and then just give a statement that "something similar happens when you work modulo $p$ a prime", without trying to prove it. That's not perfect, though.

Another approach is to not try to teach this topic, at this point in their education. While I was taught universal hashing in my undergraduate algorithms course, I've gradually come to the opinion that the topic is better to omit from the course: it's not important enough, and it relies too much on mathematical background and intuition that students don't have. Why isn't it important? Well, most students won't become theorists. For students who will eventually become practitioners, they probably won't ever need universal hash functions. Instead, ordinary hash functions are often so good that as a first approximation you can just model them as if they were universal (as a heuristic), without making a big deal about it, and do your probability calculations from there. This is enough to let you analyze uses of hash functions in a way that's accurate enough for practical/engineering purposes. And, given that, it's not clear that we need to show them constructions that provably achieve something that ordinary hash functions already heuristically achieve pretty well. That's just a personal opinion, though.

Context

StackExchange Computer Science Q#50922, answer score: 6

Revisions (0)

No revisions yet.