patternMinor
Efficient set data structure supporting insert and set equal
Viewed 0 times
insertequalefficientstructuresupportinganddataset
Problem
What's the best way to represent sets that support the following two operations:
In addition, I have an upper bound on the integers added to each set. That is, I know $N$ such that $0 \le i < N$.
I'm also willing to accept that Equal is only approximate. ie Equal(s1,s2) could be true even if s1 != s2. I guess in that case I'm looking for a hash function. Does anyone know a good hash function?
We can assume that an integer is only inserted once into a set. $N$ can be about 3-5 million, and the number of sets I want to test for equality can be in the thousands.
- Insert(s, i) - adds nonnegative integer i to set s
- Equal(s1, s2) - Tests if s1 and s2 are the same set.
In addition, I have an upper bound on the integers added to each set. That is, I know $N$ such that $0 \le i < N$.
I'm also willing to accept that Equal is only approximate. ie Equal(s1,s2) could be true even if s1 != s2. I guess in that case I'm looking for a hash function. Does anyone know a good hash function?
We can assume that an integer is only inserted once into a set. $N$ can be about 3-5 million, and the number of sets I want to test for equality can be in the thousands.
Solution
Let $F$ be a finite field with efficient procedures for addition and
multiplication and almost-uniform sampling. $\:$ Let $z$ be a field element.
(I use that letter because things will be slightly more convenient if $z$ is
the zero element, although efficient sampling means that even if one
can't find the zero of $F\hspace{-0.03 in}$, one should still be able to find a field element.)
To initialize the data structure, sample a field element $x$.
Empty sets are represented by $z$,
and
Since addition is associative and commutative, there are no false negatives. $\:$ If each integer is only inserted once into a set, then the probability of there being at least one false positive is at most
the distance from $x$'s distribution to uniform $\;+\;$ $((N\cdot \text{number of equality tests})\hspace{.02 in}/($$\hspace{-0.03 in}\operatorname{card}$$(F\hspace{.03 in})) \;\;$.
To efficiently compute $x^i\hspace{-0.04 in}$, see this paper, and either avoid the methods that involve negative
powers or make slightly stronger assumptions on $F$. $\:$ I would recommend 5.1 on page 14.
multiplication and almost-uniform sampling. $\:$ Let $z$ be a field element.
(I use that letter because things will be slightly more convenient if $z$ is
the zero element, although efficient sampling means that even if one
can't find the zero of $F\hspace{-0.03 in}$, one should still be able to find a field element.)
To initialize the data structure, sample a field element $x$.
Empty sets are represented by $z$,
Insert(s,i) adds $x^i$ to the representation of s,and
Equal(s1,s2) just tests if the two representing field elements are equal.Since addition is associative and commutative, there are no false negatives. $\:$ If each integer is only inserted once into a set, then the probability of there being at least one false positive is at most
the distance from $x$'s distribution to uniform $\;+\;$ $((N\cdot \text{number of equality tests})\hspace{.02 in}/($$\hspace{-0.03 in}\operatorname{card}$$(F\hspace{.03 in})) \;\;$.
To efficiently compute $x^i\hspace{-0.04 in}$, see this paper, and either avoid the methods that involve negative
powers or make slightly stronger assumptions on $F$. $\:$ I would recommend 5.1 on page 14.
Context
StackExchange Computer Science Q#24757, answer score: 2
Revisions (0)
No revisions yet.