HiveBrain v1.2.0
Get Started
← Back to all entries
snippetcsharpMinor

Convert a list of sets into the minimum list of non-intersecting sets

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thenonconvertintominimumintersectinglistsets

Problem

I have a list of sets. The same items may appear in multiple sets.

I want to transform this into a new list of sets where:

  • Each item only appears once in the entire list of sets.



  • For each set in this new list, on every set in the old list, the new set will either be a subset or will not intersect at all.



I want this new list of sets to contain the minimum number of sets it can while still satisfying the above two requirements.

For example, this list

  • abcd



  • befg



  • cehi



Would turn into

  • ad



  • b



  • c



  • e



  • fg



  • hi



This seems like a pretty generic problem. Is there a name for it?

Here's my implementation. It requires the comparison of each set with each other set, so I guess this has a performance \$O(k \cdot n^2)\$ where \$n\$ is the count of lists, and \$k\$ is the average length of each set.

Are there any efficiency issues that could be fixed easily?

```
public static class SetUtilities
{
public static IEnumerable> MinimumNonIntersectingSets(this IEnumerable> sets)
{
List> disjointSets = new List>();
// Special case: if there is a null set we have to add it first
if (sets.Any(s => !(s.Any())))
{
disjointSets.Add(new List());
sets = sets.Where(s => s.Any());
}

foreach (IEnumerable newSet in sets)
{
List disjointCopy = newSet.ToList();
int i = 0;
while(i < disjointSets.Count())
{
var intersection = disjointSets[i].Intersect(disjointCopy).ToList();

if ((intersection.Count() == disjointCopy.Count()) && (intersection.Count() == disjointSets[i].Count())) // They are equal
{
disjointCopy.Clear();
break;
}
if (intersection.Any())
{
disjointSets[i] = disjointSets[i].Except(intersection).ToList();
disjointSets.Insert(++i, intersection);

Solution

I guess this has a performance \$O(k \cdot n^2)\$ where \$n\$ is the count of lists, and \$k\$ is the average length of each set.

You shouldn't guess the time complexity, you should carefully analyze your code, so let's do that:

-
The empty test check is \$\mathcal{O}(n)\$.

-
The foreach is over all sets in the input, that's \$n\$.

-
The while loop is over all sets in the output so far. The number of iterations will form something similar to arithmetic progression, with maximum at \$m\$, the number of sets in the output. The average is going to be \$\approx m/2\$, so this part is going to contribute \$\mathcal{O}(m)\$.

(A formal proof would require proving that “something similar to arithmetic progression” will actually contribute that, but this is not a formal proof)

-
Calculating intersection and complements of sets is going to be \$\mathcal{O}(k)\$.

-
Inserting in the middle of a List of length \$\mathcal{O}(m)\$ is \$\mathcal{O}(m)\$.

Put together, we have \$\mathcal{O}(n + n \cdot m \cdot (k + m)) = \mathcal{O}(n \cdot m \cdot (k + m))\$.

That Insert annoyingly increases the complexity, but we can easily get rid of it either by using LinkedList (and walking it instead of using indexes) or by Adding at the end of the List (and making sure we don't iterate over the sets that were just added). This will bring us to \$\mathcal{O}(n \cdot m \cdot k)\$.

So, how big is \$m\$? Your estimate indicates you assumed it's \$\mathcal{O}(n)\$, but even your example shows that it can be larger than \$n\$. So, can it actually grow beyond \$\mathcal{O}(n)\$? Turns out, it can: when each of the \$n\$ sets has exactly one element in common with each other set*, then in the output, each of the \$n(n-1)/2\$ elements has to be in its own set, so the length of the output is \$\mathcal{O}(n^2)\$.

This means, that the total worst case complexity of your algorithm is \$\mathcal{O}(n^3 \cdot k)\$

* Or, in the language of graph theory, the sets are sets of incident edges for vertexes of a complete graph on \$n\$ vertices.

For some inputs, your output contains lots of empty sets that don't belong there. For example, try it on new[] { new[] { 1 }, new[] { 1, 2 }, new[] { 1, 2, 3 } }.

// Special case: if there is a null set we have to add it first


No, you don't need a special case for the empty set. It doesn't intersect with anything, so both your requirement will be still fulfilled, even if you don't include it in the output.

.Count()


I think it's better to use the Count property instead of the Count() extension method, to make it clear that it's an \$\mathcal{O}(1)\$ operation, not \$\mathcal{O}(n)\$.

Code Snippets

// Special case: if there is a null set we have to add it first

Context

StackExchange Code Review Q#59876, answer score: 4

Revisions (0)

No revisions yet.