HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Sampling maximal shortest paths in a graph?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
graphshortestsamplingpathsmaximal

Problem

Let S be the set of all possible shortest paths in a directed graph. A path s in S is said to be maximal if it is not a subpath of another path in S i.e. it cannot be extended to another shortest path.

Let M be the set of all maximal paths in S.  

Does anyone know how to efficiently sample paths from  M. Ideally, I would like the probability of sampling the longer paths in M to be higher. 

Example: In the directed triangle, (1,2,3), we have 3 maximal paths, 1->2->3, 2->3->1, and 3->1->2

Solution

Non-maximal paths are prefixes or suffixes of shortest paths

So a maximal path is just a path that begins at a vertex $u$ with no in-edge and ends at a vertex $v$ with no out-edge.

It's straightforward to see that if a shortest path $P$ from $u$ to $v$ is a subpath of some shortest path $Q$ from $x$ to $y$, then $P$ is also a subpath of a shortest path $R$ from $x$ to $v$, and a subpath of a shortest path $S$ from $u$ to $y$. Thus the following statements are equivalent:

  • $P$ is not maximal;



  • There exists a shortest path $Q \ne P$ that contains $P$ as a subpath;



  • There exists a shortest path $X \ne P$ that contains $P$ as a subpath, is longer than $P$ and either begins at $u$ or ends at $v$.



(1) and (2) are equivalent by definition; (2) follows from (3) trivially; (3) follows from (2) by the reasoning above and the fact that, since $Q \ne P$, at least one of $R$ and $S$ must be longer than $P$ and end at one of $P$'s two endpoints; $X$ can be whichever one this is. We will use (3) to efficiently determine whether a given shortest path is maximal.

Maximality is determined by endpoints and length

If some shortest path from $u$ to $v$ could be extended to a longer shortest path, then all shortest paths from $u$ to $v$ could be so extended -- since the only property we care about is the length, and that doesn't change. So the question of whether a given path $P$ from $u$ to $v$ is maximal is entirely determined by the length of $P$, and its start and end points $u$ and $v$ -- it is not necessary to know what specific internal vertices $P$ uses. Thus, for any pair of vertices $(u, v)$, we can determine in a preprocessing step whether this vertex pair is "maxable"; a path $P = (u, \dots, v)$ is then maximal exactly when the pair $(u, v)$ is maxable and $P$ has length equal to the distance between $u$ and $v$.

The idea is to build a complete list of all maxable pairs $(u, v)$ and the number $n_{uv}$ of shortest paths between them. Once these are known, paths can be uniformly sampled using a simple recursive procedure.

Determining maxable pairs

Let $A$ be the set of all vertex pairs $(u, v)$ such that $v$ is reachable from $u$. This can easily be determined in $O(n^2+nm)$ time by starting a DFS from each vertex in turn.

We determine the set of maxable pairs by initially declaring that every vertex pair $(u, v)$ such that $v$ is reachable from $u$ is maxable; we then "cross out" all pairs $(u, v)$ for which some shortest path from $u$ to some $x$ has a shortest path from $u$ to $v$ as an initial subpath; finally, we "cross out" all pairs $(u, v)$ for which some shortest path from some $x$ to $v$ has a shortest path from $u$ to $v$ as a final subpath. By (3) above, all pairs $(u, v)$ that have not been crossed out are maxable.

I'll now describe the first "crossing out" step. The second is identical, except that we first reverse all edge directions, and when we are about to cross out some pair $(u, v)$, we instead cross out $(v, u)$. The following assumes an unweighted graph; if the edges are weighted, you can replace the BFS with Dijkstra's algorithm.

From each vertex $u$ in turn, perform BFS on $G$, using $u$ as the root. For each vertex $v$ visited at level $L$ during a particular BFS, if $v$ is adjacent (in $G$) to any vertex on level $L+1$ of the BFS tree, cross out the pair $(u, v)$. This is because the paths in $G$ whose vertices start at the root $u$ of a BFS tree and thereafter occupy strictly increasing levels in that tree are exactly the shortest paths starting at $u$, and the "downward" (w.r.t. the BFS tree) edge leaving $v$ indicates that there is some $x$ for which a shortest path from $u$ to $x$ uses a shortest path from $u$ to $v$ as a subpath. This can be performed in $O(n+m)$ time per root vertex, for $O(n^2+nm)$ time overall.

Let $Z$ be the set of maxable vertex pairs that remain after the above has run.

Counting shortest paths

As a further preprocessing step, we would like to count the number of shortest paths between any pair of vertices $(u, v)$.

For any vertex $u$, the number of shortest paths from $u$ to every other vertex $v$ can be calculated in $O(n+m)$ time with BFS. This means that the total time needed to compute the number of paths between every pair $(u, v)$ of vertices is $O(n^2+nm)$. Let $n_{uv}$ be the number of paths from $u$ to $v$. Discard any pairs $(u, v)$ for which $n_{uv} = 0$, i.e., for which $v$ is not reachable from $u$.

Let $n_z = \Sigma_{(u, v) \in Z} n_{uv}$.

Sampling algorithm

The input is a graph $G$. If $G$ has no vertices, return an empty path. (This will arise as a base case later on.) If it has at least one vertex then it has at least one maximal path (even if that consists of a single vertex).

The next step is to choose a start and an end vertex for the path. This works differently depending on whether we are at the top level of the call stack or not:

  • If we are at the top level of the call stack, choose an endpoint pa

Context

StackExchange Computer Science Q#65849, answer score: 2

Revisions (0)

No revisions yet.