HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Number of descendants of each node in a DAG

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
descendantsnumbereachnodedag

Problem

1) Is there a better algorithm than the naive O(|E|.|V|) to compute the number of descendants of each vertex in a DAG?

2) Is there an online algorithm to do so, assuming that nodes are added one by one and connect to a non empty subset of the existing nodes?

Context: I'm interested in the case where m = O(n), millions of vertices, tens of millions of edges typically.
Alternatively, counting the number of descendants which are also sinks would be useful.

A probabilistic approach would be min-hashing, as a way to represent the set of descendants of every node. The union of the min-hash structure is trivial, and the cardinality of the union can be estimated from the number of coincidences in the min-hashes.

However, I'm not sure how well behaved that would be when propagating up the DAG, intuitively it looks like errors would compound pretty fast.

Very related: https://cstheory.stackexchange.com/questions/553/what-bounds-can-be-put-on-counting-reachable-nodes-in-a-dag
And actually a duplicate of: https://cstheory.stackexchange.com/questions/18787/what-is-the-fastest-deterministic-algorithm-for-incremental-dag-reachability

Solution


  • Topologically sort the nodes in your DAG.



  • For each node N, set N.QueryCount = 0.



  • For each node N, in reverse topological order:



  • Set N.Descendants = {N} U {C.Descendants | C in N.Children}.



  • Yield (N, N.Descendants.Count) from the algorithm.



  • If N.Parents is empty, you can dispose of N.Descendants.



  • For each C in N.Children, increment C.QueryCount. If C.QueryCount == C.Parents.Count, you can dispose of C.Descendants.



This of course is expensive if your node degrees are large. Worst case complexity may not be significantly better than your unspecified "naive algorithm."

The issue is that this is a very difficult problem to solve. Suppose there is a DAG with millions of nodes, millions of edges, etc. I show you this portion of the graph:

A--> B
 \-> C


How many descendants does A have? The number of descendants of B plus the number of descendants of C minus the number of common descendants of B and C. It's the third term that creates the difficulty. You can't know just the number of descendants of B and C - you need to also know what the descendants are.

Code Snippets

A--> B
 \-> C

Context

StackExchange Computer Science Q#23455, answer score: 5

Revisions (0)

No revisions yet.