HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Optimise the size of union of sets

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
thesizeunionoptimisesets

Problem

We consider a set of sets $S = \{S_1, S_2, \dots, S_n\}$, and we want to find a subset $I$ of $S$ that maximises the size of the union of the set $S_i$ included with a penalty for each set included. We can represent this problem as the optimisation of the following quantity:

$I = argmax_{I} |\cup_{S_i \in I} S_i| - |I|$

What would be an efficient way to solve this optimisation problem?

We know from the inclusion-exclusion principle that the union of sets can be expressed as follows:

$|S_{i_1} \cup \dots \cup S_{i_k}| = \sum_{j=1}^{k} (-1)^{j+1} \sum_{1 \leq l_1, \dots, l_j \leq k} |S_{l_1} \cap \dots \cap S_{l_j}|$

Solution

Your problem is NP-hard. Given an instance $T_1,\ldots,T_n$ of Maximum Coverage, let $x_1,\ldots,x_n$ be new variables, and create an instance $S_1,\ldots,S_n$ of your problem, where $S_i = T_i \cup \{x_i\}$. We have
$$
\left| \bigcup_{i \in I} S_i \right| -|I| = \left| \bigcup_{i \in I} T_i \right|.
$$

On the positive side, consider the following greedy algorithm: $I_0 = \emptyset$, and for $k = 1,\ldots,n$, $I_k$ is obtained from $I_{k-1}$ by adding the set which maximizes the total number of elements covered by sets in $I_k$. It is known that for $k \in \{0,\ldots,n\}$, the set $I_k$ found by the greedy algorithm has the following property: for all sets $J_k$ of size $k$ we have
$$
\left| \bigcup_{i \in I_k} S_i \right| \geq
(1-1/e) \left| \bigcup_{i \in J_k} S_i \right|.
$$
In particular, if the optimal solution to your problem has size $k$ and objective value $O$, then
$$
\left| \bigcup_{i \in I_k} S_i \right| \geq (1-1/e)(O + k),
$$
and so
$$
\left| \bigcup_{i \in I_k} S_i \right| - k \geq (1-1/e) O - k/e.
$$
The minimal optimal $k$ satisfies $O \geq k$ (each set adds at least one new element), and so $(1-1/e) O - k/e \geq (1-2/e) O$. This shows that the greedy algorithm (more explicitly, finding $k$ which maximizes $|I_k|-k$) gives a $1-2/e$ approximation to your problem (this can probably be improved).

Context

StackExchange Computer Science Q#131968, answer score: 2

Revisions (0)

No revisions yet.