patternMinor
Number of descendants of each node in a DAG
Viewed 0 times
descendantsnumbereachnodedag
Problem
1) Is there a better algorithm than the naive O(|E|.|V|) to compute the number of descendants of each vertex in a DAG?
2) Is there an online algorithm to do so, assuming that nodes are added one by one and connect to a non empty subset of the existing nodes?
Context: I'm interested in the case where m = O(n), millions of vertices, tens of millions of edges typically.
Alternatively, counting the number of descendants which are also sinks would be useful.
A probabilistic approach would be min-hashing, as a way to represent the set of descendants of every node. The union of the min-hash structure is trivial, and the cardinality of the union can be estimated from the number of coincidences in the min-hashes.
However, I'm not sure how well behaved that would be when propagating up the DAG, intuitively it looks like errors would compound pretty fast.
Very related: https://cstheory.stackexchange.com/questions/553/what-bounds-can-be-put-on-counting-reachable-nodes-in-a-dag
And actually a duplicate of: https://cstheory.stackexchange.com/questions/18787/what-is-the-fastest-deterministic-algorithm-for-incremental-dag-reachability
2) Is there an online algorithm to do so, assuming that nodes are added one by one and connect to a non empty subset of the existing nodes?
Context: I'm interested in the case where m = O(n), millions of vertices, tens of millions of edges typically.
Alternatively, counting the number of descendants which are also sinks would be useful.
A probabilistic approach would be min-hashing, as a way to represent the set of descendants of every node. The union of the min-hash structure is trivial, and the cardinality of the union can be estimated from the number of coincidences in the min-hashes.
However, I'm not sure how well behaved that would be when propagating up the DAG, intuitively it looks like errors would compound pretty fast.
Very related: https://cstheory.stackexchange.com/questions/553/what-bounds-can-be-put-on-counting-reachable-nodes-in-a-dag
And actually a duplicate of: https://cstheory.stackexchange.com/questions/18787/what-is-the-fastest-deterministic-algorithm-for-incremental-dag-reachability
Solution
- Topologically sort the nodes in your DAG.
- For each node
N, setN.QueryCount = 0.
- For each node
N, in reverse topological order:
- Set
N.Descendants = {N} U {C.Descendants | C in N.Children}.
- Yield
(N, N.Descendants.Count)from the algorithm.
- If
N.Parentsis empty, you can dispose ofN.Descendants.
- For each
CinN.Children, incrementC.QueryCount. IfC.QueryCount == C.Parents.Count, you can dispose ofC.Descendants.
This of course is expensive if your node degrees are large. Worst case complexity may not be significantly better than your unspecified "naive algorithm."
The issue is that this is a very difficult problem to solve. Suppose there is a DAG with millions of nodes, millions of edges, etc. I show you this portion of the graph:
A--> B
\-> CHow many descendants does
A have? The number of descendants of B plus the number of descendants of C minus the number of common descendants of B and C. It's the third term that creates the difficulty. You can't know just the number of descendants of B and C - you need to also know what the descendants are.Code Snippets
A--> B
\-> CContext
StackExchange Computer Science Q#23455, answer score: 5
Revisions (0)
No revisions yet.