HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Data structure for testing all subsets of a query for membership

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
allmembershipquerysubsetstestingforstructuredata

Problem

Is there a data structure that efficiently supports the following operations?

  • Add set



  • Query whether any subset of a set has been added.



This could be implemented with linear overhead by testing every added set during every query. Can it be implemented more efficiently? Small probabilities of false positives/negatives are acceptable (e.g. Bloom-filter style).

Solution

Let's say all your sets are finite subsets of $\mathbb N$. Let $S\subseteq \mathcal P( \mathbb N)$ denote your set of sets.

You want two operations:

-
$O_1(S,s')$: For any $s'\subseteq \mathbb N$, add $s'$ to $S$

-
$O_2(S,s')$: For any $s'\subseteq \mathbb N$, is there some $s\in S$ so that $s\subseteq s'$?

Here are a few ideas to speed things up:

-
You're going to test if a set if a subset of another a lot so you should probably keep the size $|s|$ of each set $s$ available in $O(1)$ so that when you need to test if $s\subseteq s'$, you start by checking if $|s|\le |s'|$ and if not, you can return false right away. And it you indeed have $|s|\le |s'|$, then you just run the normal slow test.

-
Note that if you have $s_1\in S$ and $s_2\in S$, so that $s_1\subseteq s_2$, then if $s_2\subseteq s'$, you also have $s_1\subseteq s'$. So you don't need to keep $s_2$ in $S$ for $O_2$. So you can represent $S$ by a set of sets so that $s\in S$ and $s\subsetneq s'$ implies $s'\not \in S$. In other words, you only need to keep track of the sets in $S$ that are minimal for inclusion. This can be implemented pretty efficiently: When adding a set $s'$, for all sets $s\in S$ so that $|s|\le |s'|$ (ordered by increasing cardinal), if $s\subseteq s'$, then don't add $s'$ because it won't be minimal (or is already in $S$). Otherwise, add $s'$ and then among sets $s\in S$ so that $|s'|

-
Keep a set $t$ that's equal to the union of all sets in $S$. Then, instead of running $O_2(S,s')$, you can run $O_2(S,s'\cap t)$ instead (because if for some $s\in S$, $s\subseteq s'$, then since $s\subseteq t$, $s\subseteq s'\cap t$ and, if $s\subseteq s'\cap t$, then $s\subseteq s'\cap t \subseteq s'$).

With these ideas in mind, I'd represent $S$ by a dictionnary (implemented as a doubly linked list of pairs $(key,value)$ with the keys in increasing order) $d$ so that $d(k)$ is a doubly linked list containing exactly the minimal (for inclusion) sets in $S$ of cardinal $k$.

O1(S,s')
  if O2(S,s')
    return
  if d(k) doesn't exist
    d(k) := new_doubly_linked_list()
  add(d(k),s')
  S.t := union(S.t, s')
  for each key k of d so that |s'|+1 <= k
    for s in d(k)
      if subset(s', s)
        remove s

_O2(S,s')
  for each key k of d so that k <= |s'|
    for s in d(k)
      if subset(s,s')
        return true
  return false

O2(S,s')
  return _O2(S,inter(S.t,s'))


(Notice that even though I didnd't do it explicitely in the code of O1, you can do a single traversal of the doubly linked list representing d)

I don't think this improves too much in the worst case but in average it should.

Code Snippets

O1(S,s')
  if O2(S,s')
    return
  if d(k) doesn't exist
    d(k) := new_doubly_linked_list()
  add(d(k),s')
  S.t := union(S.t, s')
  for each key k of d so that |s'|+1 <= k
    for s in d(k)
      if subset(s', s)
        remove s

_O2(S,s')
  for each key k of d so that k <= |s'|
    for s in d(k)
      if subset(s,s')
        return true
  return false

O2(S,s')
  return _O2(S,inter(S.t,s'))

Context

StackExchange Computer Science Q#74833, answer score: 2

Revisions (0)

No revisions yet.