HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Uniformly Random Nested Subset Pairs

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
randomuniformlynestedsubsetpairs

Problem

Main Question

We can represent subsets of a vector using, say, a bit mask. Let's say a nested subset is a pair of masks, for example

A B C D E F
1 0 0 1 0 1
1 0 0 0 0 1


is a nested subset, where the first mask tells us inclusion in the first set and the second mask tells us inclusion in the first. In this case the first set is {A,D,F} and its subset is {A,F}. I am trying to uniformly generate such subset pairs.

Some Thoughts

One could try naively to generate two random masks, use the first for the union, then the second for the intersection where it overlaps. However, this would generate subsets uniform in the initial subset, and not the pair.

Mathematically speaking this can be done by weighting each mask by the number of possible subsets it has (2^n where n is the size of the subset), randomly drawing a subset, then randomly drawing an intersection from it. However this is very challenging/inefficient to implement in practice

My first idea was to find some (efficient) hash function from integers into subset pairs, however I have not had much success with this. I am wondering if there is a superior way to do this. I don't need code (I don't have implementation issues); I am looking a conceptual approach. Any advice?

Solution

For the sake of simplicity, I will assume that: 1) empty sets are allowed, 2) we count a set as a subset of itself (i.e.: both the big set and small set can represent the same set). If these assumptions don't hold, the math gets a little messier but the basic principles still hold.

First, we observe that there are $2^i{n\choose i}$ pairs of subsets where the bigger subset has cardinality $i$ and that there are a total of $\sum_i 2^i{n\choose i} = 3^n$ such pairs.

We proceed as follows:

1) We choose $k$ (between $0$ and $n$), the size of the bigger subset. This will not be a uniformly random choice but our initial observation lets us know the correct probabilities, i.e.: $P(k = i) = 2^i{n\choose i} / 3^n$.

2) We choose uniformly among the ${n \choose k}$ subsets of cardinality $k$ to be our bigger subset. Reservoir sampling should do the trick.

3) We choose uniformly among the $2^k$ subsets of the bigger subset to be the smaller subset. Simply choosing a random number from $0$ to $2^k-1$ and using the natural correspondence between sets and binary representations of integers should suffice.

EDITED TO ADD:

I just realized an even slicker way to do this but it depends heavily on the first two assumptions I stated. Unlike my originally provided method, I don't see an obvious way to adapt it if either of those assumptions fails to hold.

Recall that we deduced that the total number of eligible subset pairs was $3^n$. We can identify a bijection between each subset pair and $\{0,1,2\}^n$, i.e.: vectors of length $n$ with entries in $\{0,1,2\}$. Let $v$ be such a vector. If the $i^{th}$ entry of $v$ is $2$, then put $i$ in both sets; if it's $1$, then put it only in the bigger set; if it's $0$, ignore it.

So, the problem reduces to generating a number between $0$ and $3^n-1$ and examining its ternary representation.

Context

StackExchange Computer Science Q#77272, answer score: 6

Revisions (0)

No revisions yet.