patternMinor

Efficiently representing a set of sets

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

efficientlysetssetrepresenting

Problem

(The question is an extension of this unanswered question on stackoverflow)

If we have a set of strings, we can efficiently represent them with tries. Common branches can also be merged, resulting in a DAG instead of a tree that is even more compact.

However, if we have a set of sets (i.e. the order does not matter), there are a lot of possible tries that represent the same set of elements.
An example can be found in the stackoverflow question I linked above.

Edit: For example, assume that we are given the following sets of integers.

{1,2,3,4,5}
{1,2,6,7}
{1,2,4,7}
{1,3,5,7}

Two possible representations are shown below (trie on the left, DAG on the right)

My questions are:

How hard is the problem of finding an optimal (i.e minimal) such trie?

Are there any fast algorithms for solving this problem?

If not, are there any fast algorithms that find "good" tries?

What about the DAG case?

In the scenario I have in mind there is an additional constraint that no set is a subset of another set.

Any link/paper that is even slightly relevant to any of the questions is helpful.

Solution

I suggest you start by looking at BDDs and ZDDs.

Your problem is similar to the problem of finding an optimal BDD.

I'll explain. Suppose we have sets $S_1,\dots,S_k$ over a universe $U$ containing $n$ items (so $S_i \subseteq U$). A BDD is a finite-state automaton (a DAG) for recognizing the language of all of the allowed sets, where each set is presented to the finite-state automaton as the $n$-bit string that is the characteristic function of that set. In other words, given a total order on $U$, each set $S \subseteq U$ corresponds to a binary string $[S] \in \{0,1\}^n$: $[S]_i=1$ if $S$ includes the $i$ element of $U$, otherwise not. (This order is known in the BDD literature as a "variable ordering".) Then define the language $L$ by $L=\{[S_1],[S_2],\dots,[S_k]\}$. A BDD is a finite-state automaton that recognizes the language $L$.

Notice that you have the freedom of what "variable ordering" to choose. The choice of the variable ordering determines how each set is represented as a binary function, which can affect the size of the BDD. It is well-known that the problem of choosing the optimal variable ordering (the one that leads to the smallest BDD) is NP-hard. Nonetheless, there are various heuristics for trying to find a good variable ordering, and in real-world problems, they often work reasonably well.

So, your problem can be framed as the problem of choosing the optimal variable ordering to represent your sets as a BDD.

ZDDs are a variant of BDDs that are better in some circumstances. ZDDs are especially oriented at representing a set of sets, so they might be particularly helpful for your needs. ZDDs are especially handy if you need to represent a bunch of sets that are related to each other in some way; ZDDs provide a compact and efficient encoding for that situation.

Context

StackExchange Computer Science Q#22807, answer score: 3

Revisions (0)

No revisions yet.