patternMinor
Data structure for testing all subsets of a query for membership
Viewed 0 times
allmembershipquerysubsetstestingforstructuredata
Problem
Is there a data structure that efficiently supports the following operations?
This could be implemented with linear overhead by testing every added set during every query. Can it be implemented more efficiently? Small probabilities of false positives/negatives are acceptable (e.g. Bloom-filter style).
- Add set
- Query whether any subset of a set has been added.
This could be implemented with linear overhead by testing every added set during every query. Can it be implemented more efficiently? Small probabilities of false positives/negatives are acceptable (e.g. Bloom-filter style).
Solution
Let's say all your sets are finite subsets of $\mathbb N$. Let $S\subseteq \mathcal P( \mathbb N)$ denote your set of sets.
You want two operations:
-
$O_1(S,s')$: For any $s'\subseteq \mathbb N$, add $s'$ to $S$
-
$O_2(S,s')$: For any $s'\subseteq \mathbb N$, is there some $s\in S$ so that $s\subseteq s'$?
Here are a few ideas to speed things up:
-
You're going to test if a set if a subset of another a lot so you should probably keep the size $|s|$ of each set $s$ available in $O(1)$ so that when you need to test if $s\subseteq s'$, you start by checking if $|s|\le |s'|$ and if not, you can return false right away. And it you indeed have $|s|\le |s'|$, then you just run the normal slow test.
-
Note that if you have $s_1\in S$ and $s_2\in S$, so that $s_1\subseteq s_2$, then if $s_2\subseteq s'$, you also have $s_1\subseteq s'$. So you don't need to keep $s_2$ in $S$ for $O_2$. So you can represent $S$ by a set of sets so that $s\in S$ and $s\subsetneq s'$ implies $s'\not \in S$. In other words, you only need to keep track of the sets in $S$ that are minimal for inclusion. This can be implemented pretty efficiently: When adding a set $s'$, for all sets $s\in S$ so that $|s|\le |s'|$ (ordered by increasing cardinal), if $s\subseteq s'$, then don't add $s'$ because it won't be minimal (or is already in $S$). Otherwise, add $s'$ and then among sets $s\in S$ so that $|s'|
-
Keep a set $t$ that's equal to the union of all sets in $S$. Then, instead of running $O_2(S,s')$, you can run $O_2(S,s'\cap t)$ instead (because if for some $s\in S$, $s\subseteq s'$, then since $s\subseteq t$, $s\subseteq s'\cap t$ and, if $s\subseteq s'\cap t$, then $s\subseteq s'\cap t \subseteq s'$).
With these ideas in mind, I'd represent $S$ by a dictionnary (implemented as a doubly linked list of pairs $(key,value)$ with the keys in increasing order) $d$ so that $d(k)$ is a doubly linked list containing exactly the minimal (for inclusion) sets in $S$ of cardinal $k$.
(Notice that even though I didnd't do it explicitely in the code of
I don't think this improves too much in the worst case but in average it should.
You want two operations:
-
$O_1(S,s')$: For any $s'\subseteq \mathbb N$, add $s'$ to $S$
-
$O_2(S,s')$: For any $s'\subseteq \mathbb N$, is there some $s\in S$ so that $s\subseteq s'$?
Here are a few ideas to speed things up:
-
You're going to test if a set if a subset of another a lot so you should probably keep the size $|s|$ of each set $s$ available in $O(1)$ so that when you need to test if $s\subseteq s'$, you start by checking if $|s|\le |s'|$ and if not, you can return false right away. And it you indeed have $|s|\le |s'|$, then you just run the normal slow test.
-
Note that if you have $s_1\in S$ and $s_2\in S$, so that $s_1\subseteq s_2$, then if $s_2\subseteq s'$, you also have $s_1\subseteq s'$. So you don't need to keep $s_2$ in $S$ for $O_2$. So you can represent $S$ by a set of sets so that $s\in S$ and $s\subsetneq s'$ implies $s'\not \in S$. In other words, you only need to keep track of the sets in $S$ that are minimal for inclusion. This can be implemented pretty efficiently: When adding a set $s'$, for all sets $s\in S$ so that $|s|\le |s'|$ (ordered by increasing cardinal), if $s\subseteq s'$, then don't add $s'$ because it won't be minimal (or is already in $S$). Otherwise, add $s'$ and then among sets $s\in S$ so that $|s'|
-
Keep a set $t$ that's equal to the union of all sets in $S$. Then, instead of running $O_2(S,s')$, you can run $O_2(S,s'\cap t)$ instead (because if for some $s\in S$, $s\subseteq s'$, then since $s\subseteq t$, $s\subseteq s'\cap t$ and, if $s\subseteq s'\cap t$, then $s\subseteq s'\cap t \subseteq s'$).
With these ideas in mind, I'd represent $S$ by a dictionnary (implemented as a doubly linked list of pairs $(key,value)$ with the keys in increasing order) $d$ so that $d(k)$ is a doubly linked list containing exactly the minimal (for inclusion) sets in $S$ of cardinal $k$.
O1(S,s')
if O2(S,s')
return
if d(k) doesn't exist
d(k) := new_doubly_linked_list()
add(d(k),s')
S.t := union(S.t, s')
for each key k of d so that |s'|+1 <= k
for s in d(k)
if subset(s', s)
remove s
_O2(S,s')
for each key k of d so that k <= |s'|
for s in d(k)
if subset(s,s')
return true
return false
O2(S,s')
return _O2(S,inter(S.t,s'))(Notice that even though I didnd't do it explicitely in the code of
O1, you can do a single traversal of the doubly linked list representing d)I don't think this improves too much in the worst case but in average it should.
Code Snippets
O1(S,s')
if O2(S,s')
return
if d(k) doesn't exist
d(k) := new_doubly_linked_list()
add(d(k),s')
S.t := union(S.t, s')
for each key k of d so that |s'|+1 <= k
for s in d(k)
if subset(s', s)
remove s
_O2(S,s')
for each key k of d so that k <= |s'|
for s in d(k)
if subset(s,s')
return true
return false
O2(S,s')
return _O2(S,inter(S.t,s'))Context
StackExchange Computer Science Q#74833, answer score: 2
Revisions (0)
No revisions yet.