patternpythonMinor
Disjoint-set data structure in Python 3
Viewed 0 times
structuredisjointpythondataset
Problem
Inspired by this question, I decided to implement a Disjoint-set data structure in Python 3. I mainly followed this description for understanding the algorithm (but did not do the optimizations for path compression, union by rank etc. -- so mine is a "naive" implementation).
Here is the Wikipedia article for reference.
In short, the data structure can hold n disjoint sets, and do two operations on them:
My implementation is as follows:
The test class:
```
import unittest
import disjoint
class TestSequenceFunctions(unittest.TestCase):
def setUp(self):
pass;
def test_empty(self):
dis = disjoint.Disjoint();
self.assertEqual([], dis.getSets())
self.assertEqual(None, dis.findSet(1))
def test_init(self):
dis = disjoint.Disjoint();
for i in range(1, 6):
dis.createSet(i);
for i in range(1, 6):
found = dis.findSet(i);
self.assertEqual(1, len(found))
self.assertEqual(i, found[0])
expected = [[i] for i in range(1, 6)]
self.assertEqual(expected, dis.getSets())
def test_simple(self):
dis = disjoint.Disjoint();
for i in range(1, 6):
dis.createSet(i);
pairs = [[1, 2], [2, 4], [4, 5]]
for p in pairs:
p1 = p[0];
p2 = p[1];
if dis.findSet(p1) != d
Here is the Wikipedia article for reference.
In short, the data structure can hold n disjoint sets, and do two operations on them:
- merge any two sets into a single one
- for any element, tell to which set it belongs
My implementation is as follows:
#!/usr/bin/python
class Disjoint:
def __init__(self):
self.sets = []
def createSet(self, repr):
self.sets.append([repr])
def mergeSets(self, repr1, repr2):
set1 = self.findSet(repr1);
set2 = self.findSet(repr2);
if set1 != set2:
set1.extend(set2);
self.sets.remove(set2);
def findSet(self, repr1):
for oneSet in self.sets:
if repr1 in oneSet:
return oneSet
def getSets(self):
return self.sets;The test class:
```
import unittest
import disjoint
class TestSequenceFunctions(unittest.TestCase):
def setUp(self):
pass;
def test_empty(self):
dis = disjoint.Disjoint();
self.assertEqual([], dis.getSets())
self.assertEqual(None, dis.findSet(1))
def test_init(self):
dis = disjoint.Disjoint();
for i in range(1, 6):
dis.createSet(i);
for i in range(1, 6):
found = dis.findSet(i);
self.assertEqual(1, len(found))
self.assertEqual(i, found[0])
expected = [[i] for i in range(1, 6)]
self.assertEqual(expected, dis.getSets())
def test_simple(self):
dis = disjoint.Disjoint();
for i in range(1, 6):
dis.createSet(i);
pairs = [[1, 2], [2, 4], [4, 5]]
for p in pairs:
p1 = p[0];
p2 = p[1];
if dis.findSet(p1) != d
Solution
There's a couple of points that are a possible C or Java background leaking through:
I have a feeling that the following line:
isn't actually doing what you want. It looks like you mean
Also think about what should happen if
You could then add that case to your unit tests.
In
should be 'expected'.
Finally, you say that this class models disjoint sets, but you never enforce that. Indeed, this would work:
If you are assuming clients won't do things like that, that is OK (although it may make your class less useful), but you should document that. In fact, some docstrings in general wouldn't go astray.
- You end a lot of lines with an unnecessarry semicolon
- You've named your methods and some variables using
camelCase, whereas the prefered convention in Python is usuallylowercase_with_underscores(the unittest module is an anomaly here, being one of a very few isolated stdlib modules using camelCase)
- The
getSetsmethod is unnecessarry. Python convention is to expose attributes directly because if the implementation changes later, you can just make it a property. If you really don't want to do that, then you should renameself.setstoself._setsto emphasise that it isn't part of the documented API
I have a feeling that the following line:
self.sets.append([repr])isn't actually doing what you want. It looks like you mean
self.sets.append(repr), otherwise self.sets is full of single-length lists whose only item is a collection of some kind whereas it would make more sense for those collections to be in self.sets directly. You would then make a corresponding change down in findSet so that the test there reads:if repr1 == oneSet:Also think about what should happen if
findSet fails to find the set. As written, it will fall off the end and impliiclty return None, which will almost certaintly lead to errors in short order. Instead, you should probably immediately raise ValueError with an appropriate error message. You may also want to use list.find to help with this.You could then add that case to your unit tests.
repr is the name of a builtin function, so calling your arguments by that name shadows it. A better name might be subset, for example. You may also want to document whether it should be a set, a list or a tuple, especially if that ends up mattering to any of the optimisations you write later.In
test_simple, this looks like a typo:expetctedSets = [[1, 2, 4, 5], [3]];should be 'expected'.
Finally, you say that this class models disjoint sets, but you never enforce that. Indeed, this would work:
ds = Disjoint()
subset = {1,2,3}
ds.add(subset)
ds.add(subset)
ds.add(subset)If you are assuming clients won't do things like that, that is OK (although it may make your class less useful), but you should document that. In fact, some docstrings in general wouldn't go astray.
Code Snippets
self.sets.append([repr])if repr1 == oneSet:expetctedSets = [[1, 2, 4, 5], [3]];ds = Disjoint()
subset = {1,2,3}
ds.add(subset)
ds.add(subset)
ds.add(subset)Context
StackExchange Code Review Q#79208, answer score: 8
Revisions (0)
No revisions yet.