HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Efficient set data structure supporting insert and set equal

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
insertequalefficientstructuresupportinganddataset

Problem

What's the best way to represent sets that support the following two operations:

  • Insert(s, i) - adds nonnegative integer i to set s



  • Equal(s1, s2) - Tests if s1 and s2 are the same set.



In addition, I have an upper bound on the integers added to each set. That is, I know $N$ such that $0 \le i < N$.

I'm also willing to accept that Equal is only approximate. ie Equal(s1,s2) could be true even if s1 != s2. I guess in that case I'm looking for a hash function. Does anyone know a good hash function?

We can assume that an integer is only inserted once into a set. $N$ can be about 3-5 million, and the number of sets I want to test for equality can be in the thousands.

Solution

Let $F$ be a finite field with efficient procedures for addition and

multiplication and almost-uniform sampling. $\:$ Let $z$ be a field element.

(I use that letter because things will be slightly more convenient if $z$ is

the zero element, although efficient sampling means that even if one

can't find the zero of $F\hspace{-0.03 in}$, one should still be able to find a field element.)

To initialize the data structure, sample a field element $x$.

Empty sets are represented by $z$, Insert(s,i) adds $x^i$ to the representation of s,

and Equal(s1,s2) just tests if the two representing field elements are equal.

Since addition is associative and commutative, there are no false negatives. $\:$ If each integer is only inserted once into a set, then the probability of there being at least one false positive is at most

the distance from $x$'s distribution to uniform $\;+\;$ $((N\cdot \text{number of equality tests})\hspace{.02 in}/($$\hspace{-0.03 in}\operatorname{card}$$(F\hspace{.03 in})) \;\;$.

To efficiently compute $x^i\hspace{-0.04 in}$, see this paper, and either avoid the methods that involve negative

powers or make slightly stronger assumptions on $F$. $\:$ I would recommend 5.1 on page 14.

Context

StackExchange Computer Science Q#24757, answer score: 2

Revisions (0)

No revisions yet.