patternpythonMinor
Finding a loop in a directed graph in Python
Viewed 0 times
graphloopdirectedpythonfinding
Problem
I wrote this code to find if a directed graph contains a loop or not. I am pretty new to graphs and I saw some examples on the internet but couldn't actually understand their implementations. So I read somewhere that you can colour mark the nodes. So this is my iterative implementation of that method. Is it an okay'ish' approach and how can I improve on it if it's correct?
These are the test cases
```
def dfs(graph, start):
color = {i : 'white' for i in graph}
stack = [start]
visited = []
while stack:
vertex = stack.pop()
if color[vertex] == 'grey':
return True
color[vertex] = 'grey'
visited.append(vertex)
stack.extend((graph[vertex]))
if len(graph[vertex]) == 0:
color[vertex] = 'black'
return False
def cycle_exists(graph):
for vertex in graph:
if(dfs(graph, vertex)):
return True
return False
# connected graph with cycle
graph1 = { 0 : [1, 2],
1 : [],
2 : [3],
3 : [4],
4 : [2] }
assert(cycle_exists(graph1) == True)
print("Graph1 has a cycle.")
#----------------------------------------------------
# disconnected graph with cycle
graph2 = { 0 : [],
1 : [2],
2 : [],
3 : [4],
4 : [5],
5 : [3] }
assert(cycle_exists(graph2) == True)
print("Graph2 has a cycle.")
#----------------------------------------------------
# disconnected graph without a cycle
graph3 = { 0 : [],
1 : [],
2 : [],
3 : [] }
assert(cycle_exists(graph3) == False)
print("Graph3 has no cycle.")
#----------------------------------------------------
# disconnected graph without a cycle
graph4 = { 0 : [1, 2],
1 : [3, 4],
2 : [],
3 : [],
4 : [],
5 : [6, 7],
6 : [],
7 : [] }
assert(cycle_exists(graph4) == False)
print("Graph4 has no cycle.")
#------------------------------
These are the test cases
```
def dfs(graph, start):
color = {i : 'white' for i in graph}
stack = [start]
visited = []
while stack:
vertex = stack.pop()
if color[vertex] == 'grey':
return True
color[vertex] = 'grey'
visited.append(vertex)
stack.extend((graph[vertex]))
if len(graph[vertex]) == 0:
color[vertex] = 'black'
return False
def cycle_exists(graph):
for vertex in graph:
if(dfs(graph, vertex)):
return True
return False
# connected graph with cycle
graph1 = { 0 : [1, 2],
1 : [],
2 : [3],
3 : [4],
4 : [2] }
assert(cycle_exists(graph1) == True)
print("Graph1 has a cycle.")
#----------------------------------------------------
# disconnected graph with cycle
graph2 = { 0 : [],
1 : [2],
2 : [],
3 : [4],
4 : [5],
5 : [3] }
assert(cycle_exists(graph2) == True)
print("Graph2 has a cycle.")
#----------------------------------------------------
# disconnected graph without a cycle
graph3 = { 0 : [],
1 : [],
2 : [],
3 : [] }
assert(cycle_exists(graph3) == False)
print("Graph3 has no cycle.")
#----------------------------------------------------
# disconnected graph without a cycle
graph4 = { 0 : [1, 2],
1 : [3, 4],
2 : [],
3 : [],
4 : [],
5 : [6, 7],
6 : [],
7 : [] }
assert(cycle_exists(graph4) == False)
print("Graph4 has no cycle.")
#------------------------------
Solution
Lets first review your code, and then introduce you to one alternative on how to do unit tests in Python.
-
Adding comments or docstrings is good – In
I should have to guess that, and it should be written in a docstring related to the function. This applies for all similar coded segments where it is not intuitive what the variable, function or variable values means.
-
Add blank lines to enhance readability – The common guidelines says two blank lines between functions, but I would like to extend that to have a blank line around most
You've done this to some extent, but there are two places I really would like to do this and that after the
-
Avoid loose code at top level – In general only imports, constants, functions and classes are to be at the top level of your file. This makes it, amongst others, easy to use your file as a module, and it helps organise your code. In your code all of your tests are at the top level, and should have been within some test scope. I'll come back to this below.
In general you could/should put all this code within one or more functions, like
Another nice options, when entering the world of unit tests, is to execute the unit tests when you execute the python file directly, and then otherwise use the exposed functions, i.e.
-
Remove the
-
Similarily, remove
-
Add more error handling – Your code doesn't do much error handling or validation of input, which some like and some don't like. I think I would have added at least a test that the graph exists at some level.
Refactored code
Here is my refactored code, renaming
One thing I haven't addressed if whether this is the most effective way to check for cycles. In fact I know it isn't as for each vertex we traverse the graph over and over again.
This implies that the current algorithm has an \$O(n^2)\$ complexity, which most likely could be simplified utilising another list of visited vertexes, and instead of looping through all vertexes, one could visit only those not already visited. This is left as an exercise for the reader... :-)
Unit tests
Multiple options exists for doing unit tests in Python, and it comes down to personal preferences. Here I'll present you for the internal version, namely unittest, which suffices for this code.
A common pattern for a unit test is the following:
In your case the tests are rather simple, so I've done some according to this scheme, and for so
-
Adding comments or docstrings is good – In
dfs(), not the best named function by the way, you use colors to mark something, but you don't explain what the different colors mean. I had to guess that:whitemeans unvisited vertex
greymeans visited vertex
blackmeans disconnected vertex
I should have to guess that, and it should be written in a docstring related to the function. This applies for all similar coded segments where it is not intuitive what the variable, function or variable values means.
-
Add blank lines to enhance readability – The common guidelines says two blank lines between functions, but I would like to extend that to have a blank line around most
if, for and while blocks, and the occasional block of code.You've done this to some extent, but there are two places I really would like to do this and that after the
while and for loop where you do an immmediate return statement, which now seems connected to the inner loop code.-
Avoid loose code at top level – In general only imports, constants, functions and classes are to be at the top level of your file. This makes it, amongst others, easy to use your file as a module, and it helps organise your code. In your code all of your tests are at the top level, and should have been within some test scope. I'll come back to this below.
In general you could/should put all this code within one or more functions, like
main(), and call it from the following top level code at the end of your file:if __name__ == '__main__':
main()Another nice options, when entering the world of unit tests, is to execute the unit tests when you execute the python file directly, and then otherwise use the exposed functions, i.e.
cycle_exists() and dfs().-
Remove the
visited list in dfs() – It seems like you've forgotten to remove the visited list in dfs(), as it isn't currently used for anything besides being initialised and added to. You don't do any checks against it. -
Similarily, remove
black from the palette – In your current implementation you don't use the color black for anything, and as such it can be removed, which also would allow for a simpler dict, as the only thing you seem to be interested in is whether it has been visited or not. This can be achieved using a boolean.-
Add more error handling – Your code doesn't do much error handling or validation of input, which some like and some don't like. I think I would have added at least a test that the graph exists at some level.
- No need for the inner parentheses in
dfs()when adding to stack – At least in my tests you can remove the inner set of parentheses, and no change happens.
Refactored code
Here is my refactored code, renaming
dfs() to detect_cycle() and adding some comments and docstrings:def detect_cycle(graph, start):
"""Traverse the graph, and see if we come back to a earlier visited vertex."""
if graph is None:
raise ValueError("We need a graph to detect cycles in")
# Set all vertexes to None, as we don't know the status of it
visited = {v : False for v in graph}
stack = [start]
# Traverse from start, adding connected nodes to the stack as we go
while stack:
vertex = stack.pop()
# If we hit a vertex we've seen before, it is a cycle
if visited[vertex]:
return True
# Mark this vertex as visited
visited[vertex] = True
# Add connected nodes to stack, if any
stack.extend(graph[vertex])
# If stack is empty, that means no cycle for this start vertex
return False
def cycle_exists(graph):
"""Return whether the graph has cycles or not."""
# Loop through each vertex, and check if it has cycles or not
for vertex in graph:
if detect_cycle(graph, vertex):
return True
return FalseOne thing I haven't addressed if whether this is the most effective way to check for cycles. In fact I know it isn't as for each vertex we traverse the graph over and over again.
This implies that the current algorithm has an \$O(n^2)\$ complexity, which most likely could be simplified utilising another list of visited vertexes, and instead of looping through all vertexes, one could visit only those not already visited. This is left as an exercise for the reader... :-)
Unit tests
Multiple options exists for doing unit tests in Python, and it comes down to personal preferences. Here I'll present you for the internal version, namely unittest, which suffices for this code.
A common pattern for a unit test is the following:
- Arrange – Set up all needed variables, you need to execute the test
- Act – Execute the actual test
- Assert – Assert that the test result are as expected
In your case the tests are rather simple, so I've done some according to this scheme, and for so
Code Snippets
if __name__ == '__main__':
main()def detect_cycle(graph, start):
"""Traverse the graph, and see if we come back to a earlier visited vertex."""
if graph is None:
raise ValueError("We need a graph to detect cycles in")
# Set all vertexes to None, as we don't know the status of it
visited = {v : False for v in graph}
stack = [start]
# Traverse from start, adding connected nodes to the stack as we go
while stack:
vertex = stack.pop()
# If we hit a vertex we've seen before, it is a cycle
if visited[vertex]:
return True
# Mark this vertex as visited
visited[vertex] = True
# Add connected nodes to stack, if any
stack.extend(graph[vertex])
# If stack is empty, that means no cycle for this start vertex
return False
def cycle_exists(graph):
"""Return whether the graph has cycles or not."""
# Loop through each vertex, and check if it has cycles or not
for vertex in graph:
if detect_cycle(graph, vertex):
return True
return Falseimport unittest # This line on the very top
class test__cycle_exits(unittest.TestCase):
""" Helper class to test the cycle_exists function
This class test the main method cycle_exists, which heavily
depends on the detect_cycle method, to find whether cycles exists
in the connected graphs.
"""
def test_connected_graph_with_cycle(self):
self.assertTrue(cycle_exists({
0 : [1, 2],
1 : [],
2 : [3],
3 : [4],
4 : [2]
}))
def test_disconnected_graph_with_cycle(self):
self.assertTrue(cycle_exists({
0 : [],
1 : [2],
2 : [],
3 : [4],
4 : [5],
5 : [3]
}))
def test_empty_graph_without_cycle(self):
# Arrange
graph = {
0 : [],
1 : [],
2 : [],
3 : []
}
# Act
has_cycle = cycle_exists(graph)
# Arrange
self.assertFalse(has_cycle)
def test_disconnected_graph_without_a_cycle(self):
self.assertFalse(cycle_exists({
0 : [1, 2],
1 : [3, 4],
2 : [],
3 : [],
4 : [],
5 : [6, 7],
6 : [],
7 : []
}))
def test_confused_graph_result(self):
# Arrange
graph = {
0: [1],
1: [2],
2: [0]
}
# Act
has_cycle = cycle_exists(graph)
# Arrange
self.assertFalse(has_cycle, "Didn't expect cycles for this one! Or did I?") # Dead wrong!!!
class test__detect_cycle(unittest.TestCase):
"""Helper class to test the detect_cycle function.
Using some simple graphs, test the various parts of this functions
"""
def test_single_connected_graph(self):
self.assertTrue(detect_cycle({ 0: [0]}, 0))
def test_simple_connected_graph(self):
self.assertTrue(detect_cycle({ 0: [1], 1 : [0]}, 0))
def test_single_disconnected_graph(self):
self.assertFalse(detect_cycle({ 0: []}, 0))
def test_simple_disconnected_graph(self):
self.assertFalse(detect_cycle({ 0: [1], 1 : []}, 0))
def test_graph_multiple_connections(self):
self.assertTrue(detect_cycle({
0: [1, 2, 3],
1: [],
2: [],
3: [0]
}, 0))
def test_multipaths(self):
self.assertTrue(detect_cycle({
0: [1, 2, 3],
1: [2, 3],
2: [3],
3: [0]}, 0))
def test_nonexistent_start(self):
# Arrange
graph = { 0: [1], 1: [0] }
# Act and Assert an exception
with self.assertRaises(KeyError):
detect_cycle(graph, 2)
def test_nonexistent_graph(self):
# Arrange
graph = None
# Act and Assert an exception
with self.assertRaises(ValueError):
detect_cycle(graContext
StackExchange Code Review Q#111737, answer score: 4
Revisions (0)
No revisions yet.