HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

Finding all paths from a given graph

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
graphallpathsfindingfromgiven

Problem

I need to find all paths from a given graph. I can do that for now, however my recursive code is not efficient and my graphs are very complicated, hence I need a better algorithm.

def findLeaves(gdict):
    # takes graph and find its leaf nodes
    leaves = []
    for endNode in gdict.iterkeys():
        if not gdict[endNode]:
            leaves.append(endNode)
    return leaves

graphPaths = {}

def findPaths(gname, gdict, leave, path):
    # finds all noncycle paths
    if not gdict:
        return []
    temp = [node for node in gdict.iterkeys() if leave in gdict[node].keys() and node not in path] 
    if temp:
        for node in temp:
            findPaths(gname, gdict, node, [node] + path) 
    else:
        graphPaths[gname].append(path)   

graph = {}
graph['graph'] = {0: {1: {}}, 1: {2: {}, 3: {}}, 2: {3: {}}, 3: {4: {}}, 4: {5: {}}, 5: {6: {}}, 6: {7: {}}, 7: {6: {}, 5: {}, 8: {}}, 8: {}}
graph['name'] = 'nameofthegraph'

# main
leaves = findLeaves(graph['graph'])
graphPaths['nameofthegraph'] = []

seenNodes = []
for leave in leaves:
    findPaths(graph['name'], graph['graph'], leave, [leave])

for path in graphPaths.values()[0]:
    print path


There is only one starting node, which makes things easier for recursive function. Every leaf needs to arrive there if a leaf is followed in reverse order. The starting node is the one which does not have an incoming edge.

These structures are taken from pygraphviz, which simply shows the outgoing edges from any nodes. Keys are the nodes and values are outgoing edges to the nodes. However, when I have very complicated graphs as below, this code could not find all paths.

Is there any better algorithm that you can suggest? Or is there any way to optimize my algorithm for complicated graphs?

Solution


  1. Introduction



Your goal (if I understand correctly, of generating all the execution paths in a state transition graph for a Java thread and then using those execution paths to test the code) seems unrealistic to me.

Even in an acyclic graph (or with the condition that you must not revisit a state) there may be exponentially many paths. In a graph with cycles (like any realistic state transition graph) there are infinitely many paths.

You cannot afford the time to generate all these path, let alone the time to run the test cases based on the paths: the best you can hope for is to intelligently (or randomly) sample the space of paths.

  1. Detailed review



-
There are no docstrings. What do your functions do, what data structures do they act on, and how do I call them?

-
The singular of leaves is leaf in English (not leave).

-
Your program has code at the top level of the module. This makes it inconvenient to test the code from the interactive interpreter: each time you reload the module, the program runs. It would be better to put the top-level code into a function and call this function inside the guard if __name__ == '__main__':.

-
Your program relies on a global variable graphPaths. This makes it inconvenient to test the program repeatedly, as you have to ensure that this variable is reinitialized each time you want to run the test case. Instead of passing gname and appending to graphPaths[gname] inside findPaths, it would be better to pass the list as an argument to findPaths.

-
But it would be even better not to append paths to a list, but instead to generate them one at a time using Python's yield instruction. That would allow you to interleave the generation of the paths with their further processing, and would avoid attempting to store exponentially many paths in memory.

-
There's a global variable seenNodes which is not used.

-
There's no point in writing:

for endNode in gdict.iterkeys():


when you could achieve the same thing by writing:

for endNode in gdict:


This also has the advantage of being portable to Python 3.

-
The lines:

if not gdict:
    return []


are unnecessary. You can't call findPaths unless you can give it a vertex for the leave argument, which means that the graph must be non-empty.

-
The argument leave to findPaths is poorly named. On the first call to the function it is (what you call) a leaf, but thereafter it can be any vertex.

-
The leave argument to findPaths is always the first element of the argument path. So why not get it from there and avoid repeating yourself?

-
Your algorithm searches backwards through the graph from its sinks. This is inconvenient because in order to prepare the list temp of vertices that might appear on the path before the current vertex, you have to search the whole graph. It would be simpler to search forwards because then you could look up the neighbours directly.

-
Your choice of graph data structure is not one of the standard representations such as adjacency list (a map from a vertex to a list or set of its neighbouring vertices). It looks to me as though your representation is "map from vertex to map from neighbouring vertex to the empty map". Why did you choose this representation?

Instead of the representation:

{0: {1: {}}, 1: {2: {}, 3: {}}, 2: {3: {}}, 3: {4: {}}, 4: {5: {}}, 5: {6: {}}, 6: {7: {}}, 7: {6: {}, 5: {}, 8: {}}, 8: {}}


I would use:

{0: [1], 1: [2, 3], 2: [3], 3: [4], 4: [5], 5: [6], 6: [7], 7: [6, 5, 8], 8: []}


or the similar alternative with sets instead of lists.

-
Your test case is very small: it only has two paths. This means that it does not really exercise the code. See §4 below for some more thorough test cases.

  1. Revised code



def paths(graph, v):
    """Generate the maximal cycle-free paths in graph starting at v.
    graph must be a mapping from vertices to collections of
    neighbouring vertices.

    >>> g = {1: [2, 3], 2: [3, 4], 3: [1], 4: []}
    >>> sorted(paths(g, 1))
    [[1, 2, 3], [1, 2, 4], [1, 3]]
    >>> sorted(paths(g, 3))
    [[3, 1, 2, 4]]

    """
    path = [v]                  # path traversed so far
    seen = {v}                  # set of vertices in path
    def search():
        dead_end = True
        for neighbour in graph[path[-1]]:
            if neighbour not in seen:
                dead_end = False
                seen.add(neighbour)
                path.append(neighbour)
                yield from search()
                path.pop()
                seen.remove(neighbour)
        if dead_end:
            yield list(path)
    yield from search()


Notes:

-
This uses the yield from statement and so requires Python 3.3 or later. If you have to use an earlier Python, you can rewrite yield from search() as:

for p in search():
    yield p


-
seen is a collection containing the same vertices as path, but in a set for efficient computation of `neighbour not

Code Snippets

for endNode in gdict.iterkeys():
for endNode in gdict:
if not gdict:
    return []
{0: {1: {}}, 1: {2: {}, 3: {}}, 2: {3: {}}, 3: {4: {}}, 4: {5: {}}, 5: {6: {}}, 6: {7: {}}, 7: {6: {}, 5: {}, 8: {}}, 8: {}}
{0: [1], 1: [2, 3], 2: [3], 3: [4], 4: [5], 5: [6], 6: [7], 7: [6, 5, 8], 8: []}

Context

StackExchange Code Review Q#55767, answer score: 14

Revisions (0)

No revisions yet.