patternMinor

Disjoint sets implementation

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

implementationsetsdisjoint

Problem

I would like to get some feedback on the following implementation of disjoint sets, based on disjoint-sets forests (Cormen, et.al., Introduction to Algorithms, 2nd ed., p.505ff). It should have the following properties:

Abstracts away all internal data structures and can be used to create disjoint sets of any type T

Supports the traditional disjoint set operations: add, union, find

Performs union-by-rank and path compression optimizations

Any general advice is welcome, but I have my doubts in particular about the following points:

Mixture of object-oriented and functional programming - it feels like best of both worlds, but I'm always unsure on which side to lean stronger.

Thread-safety: is it sufficient to synchronize the public methods and initialization? is it necessary? (not sure about add and size)

The MatchError: what is a best practice to deal with this sort of thing?

```
import scala.annotation.tailrec

/**
* This class implements a disjoint-sets algorithm with
* union-by-rank and path compression. The forest/sets/etc. are
* internal data structures not exposed to the outside. Instead,
* it is possible to add new elements (which end up in a newly
* created set), union two elements, and hence, their sets, and
* find the representative of a disjoint-set by a given element of
* the set.
*/
class DisjointSetsT {

/**
* Add a new single-node forest to the disjoint-set forests. It will
* be placed into its own set.
*/
def add(elem : T) : Unit = synchronized {
nodes += (elem -> DisjointSets.Node(elem, 0, None))
}

/**
* Union the disjoint-sets of which elem1
* and elem2 are members of.
* @return the representative of the unioned set
* @precondition elem1 and elem2 must have been added before
*/
def union(elem1 : T, elem2 : T) : T = synchronized {
// lookup elements
require(nodes.contains(elem1) && nodes.contains(elem2),
"Only elements

Solution

I have a feeling that your find is suboptimal. I'd say you should compress the whole path up to the root, not just the parent of the single node. So you should find the root node first and then traverse the path again and set parent to Some(rootNode) for all of the nodes.

Concerning MatchErrors, I'd perhaps use

def union(elem1 : T, elem2 : T) : T =
  // retrieve representative nodes
  (nodes.get(elem1).map(_.getRepresentative), 
   nodes.get(elem2).map(_.getRepresentative)) match {
    case (Some(n1), Some(n2)) if n1 == n2 => // ...

    // ...

    case _ => // one of the values is None
        require(false,
            "Only elements previously added to the disjoint-sets can be unioned")
  }

Concerning synchronization, I'd let the user pick if (s)he wants a synchronized implementation or not. I'd define a trait with all the operations (that's always a good idea to separate the contract):

trait DisjointSets[T] {
    def add(elem : T) : Unit;
    def union(elem1 : T, elem2 : T) : T;
    def find(elem : T) : Option[T];
    def size : Int;
}

and then something like

class DisjointSetsImpl[T] private (initialElements : Seq[T] = Nil)
    extends DisjointSets[T]
{
    // ...
}

and then a wrapper (either explicit or just within a method on the companion object)

class DisjointSetSync[T](val underlying: DisjointSet[T])
    extends DisjointSet[T]
{
    override def add(elem: T) : Unit = synchronized {
        underlying.add(elem);
    }

    // etc.
}

You will need to synchronize all public operations, including add and size (consider one thread is reading the size while the other one is adding a new element to the set using add).

This way users can use faster, default, non-synchronized implementation, and wrap it into a synchronized one, if needed. In most cases, users will want their own synchronization anyway, because they often use other synchronization primitives (such as locks or semaphores) or need to synchronize several operations at once.

Also, you don't need to synchronize initializing code such as

// Initialization
synchronized { initialElements map (add _) }

because at the time of the initialization of an object, nobody has a reference to it yet, so it's not possible that other threads access it concurrently.

I'd perhaps use a different name for size, something like setCount, because it can be easily confused with the number of elements, not the number of disjoint sets. You could also make size faster by keeping count of the number of distinct sets, and just decrementing it after each successful union. (I vaguely recall that this operation was useful for some algorithm, but I can't remember which one.)

You could also consider extending Scala's standard traits (after renaming size) to make your class even more useful:

Growable (this would replace add with +=).

Traversable which would traverse all elements in all sets.

PartialFunction[T,T] to find a representative for a node (if it's in the set - this would be a synonym for find).

This won't cost you anything and can be useful sometimes.

There has been a request for an implementation like yours, perhaps you could consider publishing it as a small library somewhere.

Code Snippets

def union(elem1 : T, elem2 : T) : T =
  // retrieve representative nodes
  (nodes.get(elem1).map(_.getRepresentative), 
   nodes.get(elem2).map(_.getRepresentative)) match {
    case (Some(n1), Some(n2)) if n1 == n2 => // ...

    // ...

    case _ => // one of the values is None
        require(false,
            "Only elements previously added to the disjoint-sets can be unioned")
  }

trait DisjointSets[T] {
    def add(elem : T) : Unit;
    def union(elem1 : T, elem2 : T) : T;
    def find(elem : T) : Option[T];
    def size : Int;
}

class DisjointSetsImpl[T] private (initialElements : Seq[T] = Nil)
    extends DisjointSets[T]
{
    // ...
}

class DisjointSetSync[T](val underlying: DisjointSet[T])
    extends DisjointSet[T]
{
    override def add(elem: T) : Unit = synchronized {
        underlying.add(elem);
    }

    // etc.
}

// Initialization
synchronized { initialElements map (add _) }

Context

StackExchange Code Review Q#17621, answer score: 2

Revisions (0)

No revisions yet.