HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Strongly connected component algorithm in Python 2.7

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
stronglyalgorithmconnectedpythoncomponent

Problem

This is my implementation of Kosaraju's algorithm for detecting strongly connected components, post here for advice. Some special areas looking for advice,

  • Not sure if my current implementation for adjacent table of graph is efficient?



  • Any better ideas to reverse the graph to make it more efficient?



  • I use preVisited and visited to track what nodes are already reached by a strongly connected components, it there a better way?



  • Any other areas my code could improve, especially for performnace.



```
from collections import defaultdict
visitTimeCounter = 0

class GraphNode:
def __init__(self, name):
self.name=name
self.connections = [] # list of GraphNode, directional
self.visitTime = -1

def DFS(self, visited):
global visitTimeCounter
if self.name in visited:
return
visited.append(self.name)

for node in self.connections:
if node.name in visited:
pass
else:
node.DFS(visited)
visitTimeCounter += 1
self.visitTime = visitTimeCounter

if __name__ == "__main__":
nodeA = GraphNode('A')
nodeB = GraphNode('B')
nodeC = GraphNode('C')
nodeD = GraphNode('D')

allNodes = [nodeA, nodeB, nodeC, nodeD]

nodeA.connections.append(nodeB)
nodeA.connections.append(nodeC)
nodeB.connections.append(nodeA)
nodeC.connections.append(nodeD)
nodeD.connections.append(nodeC)

visited = []

for node in allNodes:
if node.name not in visited:
node.DFS(visited)

for node in allNodes:
print node.name, node.visitTime

allNodesReverse=[]
nodeIndex = defaultdict(GraphNode)
# fork new nodes
for node in allNodes:
newNode=GraphNode(node.name)
newNode.visitTime = node.visitTime
allNodesReverse.append(newNode)
nodeIndex[newNode.name]=newNode

# build conections
for node in allNodes:
for neighbour in node.conne

Solution


  1. Review



-
The code is not portable to Python 3 due to the use of the print statement.

-
The code is not organized into functions. This makes it hard to understand and hard to test.

-
Using a global variable (namely visitTimeCounter) makes code fragile (if the global variable starts out with the wrong value then it goes wrong) and hard to test (because you have to remember to reset the global variable each time you call it).

-
A graph node isn't the best choice of data structure here (it only has one attribute that you actually use, namely the list of its out-neighbours). A better data structure for this algorithm is a directed graph. A directed graph supports several methods that can be used by this algorithm: adding nodes and edges to the graph; iterating over the nodes; and getting the in- or out-neighbours of a node.

class Graph(object):
    """A directed graph. Nodes may be any hashable values."""

    def __init__(self, edges=()):
        """Initialize graph from optional iterable of edges."""
        self._out = defaultdict(set) # Map from node to its out-neighbours
        self._in = defaultdict(set) # Map from node to its in-neighbours
        for u, v in edges:
            self.add_edge(u, v)

    def add_node(self, u):
        """Add a node u to the graph."""
        self._out[u]
        self._in[u]

    def add_edge(self, u, v):
        """Add a directed edge to the graph, from node u to node v."""
        self.add_node(u)
        self.add_node(v)
        self._out[u].add(v)
        self._in[v].add(u)

    def in_neighbours(self, u):
        """Return the set of in-neighbours of the node u."""
        return self._in[u]

    def out_neighbours(self, u):
        """Return the set of out-neighbours of the node u."""
        return self._out[u]

    def __iter__(self):
        """Return iterator over the nodes of the graph."""
        return iter(self._out)


The constructor makes it very easy to create example graphs:

Graph('AB BA CD DC AC'.split())


(Compare this to the ten lines of code needed to create the graph in the post.)

-
To make the code for the algorithm easy to understand, it would make sense to closely follow the algorithm description. Wikipedia describes the algorithm like this:



  • For each vertex \$u\$ of the graph, mark \$u\$ as unvisited. Let \$L\$ be empty.



  • For each vertex \$u\$ of the graph do \$Visit(u)\$, where \$Visit(u)\$ is the recursive subroutine:


If \$u\$ is unvisited then:



  • Mark \$u\$ as visited.



  • For each out-neighbour \$v\$ of \$u\$, do \$Visit(v)\$.



  • Prepend \$u\$ to \$L\$.




  • For each element \$u\$ of \$L\$ in order, do \$Assign(u,u)\$ where \$Assign(u,root)\$ is the recursive subroutine:


If \$u\$ has not been assigned to a component then:



  • Assign \$u\$ as belonging to the component whose root is \$root\$.



  • For each in-neighbour \$v\$ of \$u\$, do \$Assign(v,root)\$.





which can be translated directly into Python like this:

def components(self):
    """Return list of strongly connected components."""
    visited = set()         # Set of visited nodes.
    L = []                  # Nodes in topological order.
    def visit(u):
        if u not in visited:
            visited.add(u)
            for v in self.out_neighbours(u):
                visit(v)
            L.insert(0, u)
    for u in self:
        visit(u)
    component = defaultdict(set) # Map from root to its component.
    assigned = set()       # Set of nodes assigned to a component.
    def assign(u, root):
        if u not in assigned:
            component[root].add(u)
            assigned.add(u)
            for v in self.in_neighbours(u):
                assign(v, root)
    for u in L:
        assign(u, u)
    return list(component.values())


It's much easier to compare and check this kind of directly translated code against the algorithm description than it is to check the more complicated translation in the post.

(Note that the directly translated code above has the correct functionality, but not the right complexity: inserting an item at the beginning of a list with L.insert(0, u) is inefficient (it takes time proportional to the length of the list). We can fix this by adding nodes to the end of the list instead (which is efficient). This requires three changes. First, the comment becomes:

L = []           # Nodes in reverse topological order.


Second, L.insert(0, u) becomes L.append(u). Third, the iteration over the list L proceeds in reverse order:

for u in reversed(L):


If you don't like the idea of maintaining this list in reverse order, then an alternative would be to switch L from a list to a collections.deque, which has an efficient appendleft method.)

Some quick tests:

```
>>> Graph('AB BA CD DC AC'.split()).components()
[{'D', 'C'}, {'B', 'A'}]
>>> Graph('AB BC CD DA AE FB DG'.split()).components()
[{'D', 'B', 'C', 'A'}, {'E'}, {'F'}, {'G'}]
>>> Graph('AB BC CD

Code Snippets

class Graph(object):
    """A directed graph. Nodes may be any hashable values."""

    def __init__(self, edges=()):
        """Initialize graph from optional iterable of edges."""
        self._out = defaultdict(set) # Map from node to its out-neighbours
        self._in = defaultdict(set) # Map from node to its in-neighbours
        for u, v in edges:
            self.add_edge(u, v)

    def add_node(self, u):
        """Add a node u to the graph."""
        self._out[u]
        self._in[u]

    def add_edge(self, u, v):
        """Add a directed edge to the graph, from node u to node v."""
        self.add_node(u)
        self.add_node(v)
        self._out[u].add(v)
        self._in[v].add(u)

    def in_neighbours(self, u):
        """Return the set of in-neighbours of the node u."""
        return self._in[u]

    def out_neighbours(self, u):
        """Return the set of out-neighbours of the node u."""
        return self._out[u]

    def __iter__(self):
        """Return iterator over the nodes of the graph."""
        return iter(self._out)
Graph('AB BA CD DC AC'.split())
def components(self):
    """Return list of strongly connected components."""
    visited = set()         # Set of visited nodes.
    L = []                  # Nodes in topological order.
    def visit(u):
        if u not in visited:
            visited.add(u)
            for v in self.out_neighbours(u):
                visit(v)
            L.insert(0, u)
    for u in self:
        visit(u)
    component = defaultdict(set) # Map from root to its component.
    assigned = set()       # Set of nodes assigned to a component.
    def assign(u, root):
        if u not in assigned:
            component[root].add(u)
            assigned.add(u)
            for v in self.in_neighbours(u):
                assign(v, root)
    for u in L:
        assign(u, u)
    return list(component.values())
L = []           # Nodes in reverse topological order.
for u in reversed(L):

Context

StackExchange Code Review Q#143265, answer score: 4

Revisions (0)

No revisions yet.