HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Integer set disjointness query on sketches with something like homomorphic hashing

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
somethingwithdisjointnessquerylikehomomorphichashingsketchesintegerset

Problem

Suppose I have two sets of integers $A$ and $B$ and I have a sketch data structure described by a function $\mathsf{sketch}_n : \mathcal{P}(\mathbb{Z}) \to 2^n$ that returns a bitstring of size $n$. These sketches can be checked for disjointness with a function $\mathsf{disjoint}_n : 2^n \times 2^n \to 2$, where if $\mathsf{disjoint}_n(\mathsf{sketch}_n(A), \mathsf{sketch}_n(B)) = 1$, then $A \cap B = \emptyset$ (but $A \cap B = \emptyset$ may also be true when $\mathsf{disjoint}$ returns 0, i.e.: no false positives).

This can easily be accomplished with a bloom filter, where $\mathsf{sketch}_n$ is the result of inserting each element of the set into a bloom filter and $\mathsf{disjoint}_n(x, y) = (x \,\&\, y) \equiv \mathbf{0}$ ($\&$ being bitwise AND).

Now, suppose that I want there to exist a function $\mathsf{shift}_n : 2^n \times \mathbb{Z} \to 2^n$ such that $\mathsf{shift}_n(\mathsf{sketch}_n(X), \delta) = \mathsf{sketch}_n(\{x + \delta \mid x \in X\})$. Is there a data structure that supports this operation, perhaps using some variety of homomorphic hashing?

Alternatively, $\mathsf{shift}_n$ need not exactly return the same sketch as $\{x + \delta \mid x \in X\}$, but rather a different sketch that still has the no-false-positives property when checked for disjointness with another set.

I'm also willing to consider answers where we can't ensure no false positives, but there is a good bound on the probability of a false positive.

Also, wherever I've said $\mathbb{Z}$ here, feel free to interpret that as some finite set of integers (e.g.: 32-bit integers).

Bonus points if there is a way to compute the union of two sketches, and if there is a way to extend this to sets of tuples in $\mathbb{Z} \times 2^p$ where $\mathsf{shift}$ only affects the first element of each tuple.

Solution

Here is one simple idea -- not sure if it's practical.

Define $\mathsf{sketch}_n(X)$ to be a length-$n$ bitstring in which bit $i$ is set iff there exists $x \in X$ such that $x = i \mod n$. In practice, this means that inserting an element $x$ into a set involves turning on bit $x \mod n$. $\mathsf{disjoint}_n(\cdot)$ is defined as before. Then $\mathsf{shift}_n(\cdot, \delta)$ can be found by rotating the sketch right by $\delta$ bits (which can be done implicitly by storing a "rotation amount" in the data structure).

To reduce the false positive rate for membership tests (but increase the false negative rate for disjointness tests), you can store multiple bitstrings using different moduli -- e.g., if the moduli 100, 101 and 105 are used, there would be 306 bits in total. Intuition suggests relatively prime moduli would work best, but I have not tried looking into this in detail.

Context

StackExchange Computer Science Q#134253, answer score: 2

Revisions (0)

No revisions yet.