HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Disjoint-set data structure in Python 3

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
structuredisjointpythondataset

Problem

Inspired by this question, I decided to implement a Disjoint-set data structure in Python 3. I mainly followed this description for understanding the algorithm (but did not do the optimizations for path compression, union by rank etc. -- so mine is a "naive" implementation).
Here is the Wikipedia article for reference.

In short, the data structure can hold n disjoint sets, and do two operations on them:

  • merge any two sets into a single one



  • for any element, tell to which set it belongs



My implementation is as follows:

#!/usr/bin/python

class Disjoint:

    def __init__(self):
        self.sets = []

    def createSet(self, repr):
        self.sets.append([repr])

    def mergeSets(self, repr1, repr2):
        set1 = self.findSet(repr1);
        set2 = self.findSet(repr2);
        if set1 != set2:
            set1.extend(set2);
            self.sets.remove(set2);

    def findSet(self, repr1):
        for oneSet in self.sets:
            if repr1 in oneSet:
                return oneSet

    def getSets(self):
        return self.sets;


The test class:

```
import unittest
import disjoint

class TestSequenceFunctions(unittest.TestCase):

def setUp(self):
pass;

def test_empty(self):
dis = disjoint.Disjoint();
self.assertEqual([], dis.getSets())
self.assertEqual(None, dis.findSet(1))

def test_init(self):
dis = disjoint.Disjoint();
for i in range(1, 6):
dis.createSet(i);
for i in range(1, 6):
found = dis.findSet(i);
self.assertEqual(1, len(found))
self.assertEqual(i, found[0])
expected = [[i] for i in range(1, 6)]
self.assertEqual(expected, dis.getSets())

def test_simple(self):
dis = disjoint.Disjoint();
for i in range(1, 6):
dis.createSet(i);
pairs = [[1, 2], [2, 4], [4, 5]]
for p in pairs:
p1 = p[0];
p2 = p[1];

if dis.findSet(p1) != d

Solution

There's a couple of points that are a possible C or Java background leaking through:

  • You end a lot of lines with an unnecessarry semicolon



  • You've named your methods and some variables using camelCase, whereas the prefered convention in Python is usually lowercase_with_underscores (the unittest module is an anomaly here, being one of a very few isolated stdlib modules using camelCase)



  • The getSets method is unnecessarry. Python convention is to expose attributes directly because if the implementation changes later, you can just make it a property. If you really don't want to do that, then you should rename self.sets to self._sets to emphasise that it isn't part of the documented API



I have a feeling that the following line:

self.sets.append([repr])


isn't actually doing what you want. It looks like you mean self.sets.append(repr), otherwise self.sets is full of single-length lists whose only item is a collection of some kind whereas it would make more sense for those collections to be in self.sets directly. You would then make a corresponding change down in findSet so that the test there reads:

if repr1 == oneSet:


Also think about what should happen if findSet fails to find the set. As written, it will fall off the end and impliiclty return None, which will almost certaintly lead to errors in short order. Instead, you should probably immediately raise ValueError with an appropriate error message. You may also want to use list.find to help with this.

You could then add that case to your unit tests.

repr is the name of a builtin function, so calling your arguments by that name shadows it. A better name might be subset, for example. You may also want to document whether it should be a set, a list or a tuple, especially if that ends up mattering to any of the optimisations you write later.

In test_simple, this looks like a typo:

expetctedSets = [[1, 2, 4, 5], [3]];


should be 'expected'.

Finally, you say that this class models disjoint sets, but you never enforce that. Indeed, this would work:

ds = Disjoint()
subset = {1,2,3}
ds.add(subset)
ds.add(subset)
ds.add(subset)


If you are assuming clients won't do things like that, that is OK (although it may make your class less useful), but you should document that. In fact, some docstrings in general wouldn't go astray.

Code Snippets

self.sets.append([repr])
if repr1 == oneSet:
expetctedSets = [[1, 2, 4, 5], [3]];
ds = Disjoint()
subset = {1,2,3}
ds.add(subset)
ds.add(subset)
ds.add(subset)

Context

StackExchange Code Review Q#79208, answer score: 8

Revisions (0)

No revisions yet.