patterncppModerate
Sum of all paths between all pairs of nodes in a tree
Viewed 0 times
nodesallpathsbetweensumpairstree
Problem
I'm trying to solve a competitive programming problem. It basically gives a undirected graph (tree-like: no multiple paths between two nodes...) and asks for the sum of all possible paths between any pair of nodes in the graph (each path must be counted only once, in other words, if you have already counted the path from A to B, you shouldn't count the path from B to A).
To solve this, I've tried to remove each edge (separating the tree in 2 components) and count the number of vertices in one of the components (the count of nodes of the second component can be derived from the count found in the first).
The total weight that a edge will contribute is: (the number of paths that pass by it = first_component_size second_component_size) (the weight of the edge).
Input description:
The input is composed of several instances (the number of instances is given on the first line of input).
The first input row of each instance contains an integer N (1 ≤ N ≤ 10000), representing the number of nodes. The nodes are enumerated from 1 to N.
Each one of the following N-1 rows contains three integers A, B and C (1 ≤ A, B ≤ N, 1 ≤ C ≤ 50), indicating that the edge that connects the nodes A and B has length C.
Output description:
For each instance print the sum of lenghts of the paths that connects all the pairs of nodes. The answer must be in MOD 1300031.
```
#include
#include
#include
const int MOD = 1300031;
using edge = std::pair;
using adj_list = std::vector;
using graph = std::vector ;
struct myEdge {
int from, to, weight;
};
std::vector visited;
int count_nodes_on_component(graph &g, int v) {
int count_of_nodes = 1; //1 for itself
adj_list &adj = g[v];
for(auto &e : adj) {
if(!visited[e.first]) {
visited[e.first] = true;
count_of_nodes += count_nodes_on_component(g, e.first);
visited[e.first] = false;
}
}
return count_of_nodes;
}
int main ( void ) {
int test_cases;
scanf
To solve this, I've tried to remove each edge (separating the tree in 2 components) and count the number of vertices in one of the components (the count of nodes of the second component can be derived from the count found in the first).
The total weight that a edge will contribute is: (the number of paths that pass by it = first_component_size second_component_size) (the weight of the edge).
Input description:
The input is composed of several instances (the number of instances is given on the first line of input).
The first input row of each instance contains an integer N (1 ≤ N ≤ 10000), representing the number of nodes. The nodes are enumerated from 1 to N.
Each one of the following N-1 rows contains three integers A, B and C (1 ≤ A, B ≤ N, 1 ≤ C ≤ 50), indicating that the edge that connects the nodes A and B has length C.
Output description:
For each instance print the sum of lenghts of the paths that connects all the pairs of nodes. The answer must be in MOD 1300031.
```
#include
#include
#include
const int MOD = 1300031;
using edge = std::pair;
using adj_list = std::vector;
using graph = std::vector ;
struct myEdge {
int from, to, weight;
};
std::vector visited;
int count_nodes_on_component(graph &g, int v) {
int count_of_nodes = 1; //1 for itself
adj_list &adj = g[v];
for(auto &e : adj) {
if(!visited[e.first]) {
visited[e.first] = true;
count_of_nodes += count_nodes_on_component(g, e.first);
visited[e.first] = false;
}
}
return count_of_nodes;
}
int main ( void ) {
int test_cases;
scanf
Solution
\$O(n^2)\$ time complexity
Your current algorithm has \$O(n^2)\$ time complexity because for each edge, you count the number of vertices on one side of the edge, which is a linear time operation.
However, the program is pretty fast. I ran it on three large test cases. The first was 10000 nodes arranged in a line, the second was 10000 nodes arranged in a star, and the third was 10000 nodes arranged in a star, but with the edges listed with reverse direction. Here were the results:
The first star was quick because the program always counted the outside (i.e. 1 node always). The second star was slow because the program always counted the inside (n-1 nodes always). So you could consider these the best and worst possible test cases.
Alternative strategy
The problem can be solved in \$O(n)\$ time. Instead of picking a random edge and having to count one side, if you picked an edge connected to a leaf node, you automatically know that the count on the leaf side is 1. There is always at least 1 leaf node in the graph, because the graph is a spanning tree with no cycles. If you then remove the leaf node from the graph, there must still be at least 1 leaf node, so you can keep finding a leaf node and removing it until there are no nodes left.
A small complication arises after you find the first leaf node and remove it from the tree. The node it was attached to must now remember that it was connected to the leaf, in order to keep the distance calculations correct. In other words, if leaf A is attached to node B and leaf A is removed, then node B must now count as 2 nodes. If node B is later removed, it must add its 2 nodes to some other node, etc.
So the pseudocode is this:
You can find all the leaf nodes in \$O(n)\$ time because:
Similarly, finding and removing edges can be done in amortized \$O(n)\$ time if you handle the edge lists correctly. One key point is that even though you need to search through edge lists, there are only \$n\$ total edges, so that total time spent searching through edge lists is limited to \$O(n)\$.
Sample \$O(n)\$ program
Here is the program I wrote in C with the alternative strategy. It solved all three test cases in 0.01 seconds.
Note: I did edge removal in a somewhat nonstandard way, where I marked the edge invalid instead of removing it from the edge list. I wrote another version where I kept the edges in a doubly linked circular list and I actually removed the edge from the list when needed. Both versions are just as fast, so if anyone wants to see the other version, just ask.
```
#include
#include
#include
const int MOD = 1300031;
typedef struct Edge {
struct Edge *next;
int to;
int weight;
} Edge;
typedef struct Node {
int numConnected;
int numEdges;
Edge *edges;
} Node;
int main(void)
{
int numTests = 0;
Node *nodes = NULL;
Edge *edges = NULL;
scanf("%d", &numTests);
for (int i=0; i 0; j--) {
int from = 0;
int to = 0;
int weight = 0;
scanf("%d %d %d", &from, &to, &weight);
from--;
to--;
edges[e].to = to;
edges[e].next = nodes[from].edges;
edges[e].weight = weight;
nodes[from].edges = &edges[e];
nodes[from].numEdges++;
e++;
edges[e].to = from;
edges[e].next = nodes[to].edges;
edges[e].weight = weight;
nodes[to].edges = &edges[e];
nodes[to].numEdges++;
e++;
}
int search = 0;
int cur = -1;
uint64_t dist = 0;
for (int nodesLeft = numNodes-1; nodesLeft > 0; nodesLeft--) {
Edge *edge;
// Either use the node known to have 1 edge, or find the next
// node that has 1 edge.
if (cur == -1) {
while (nodes[search].numEdges != 1) {
search++;
if (search >= numNodes) {
printf("ERROR, reached end of nodes\n");
exit(1);
}
}
cur = search++;
}
// Find the one valid edge in the list of edges.
for (edg
Your current algorithm has \$O(n^2)\$ time complexity because for each edge, you count the number of vertices on one side of the edge, which is a linear time operation.
However, the program is pretty fast. I ran it on three large test cases. The first was 10000 nodes arranged in a line, the second was 10000 nodes arranged in a star, and the third was 10000 nodes arranged in a star, but with the edges listed with reverse direction. Here were the results:
Line : 1.15 sec
Star : 0.01 sec
Star2: 2.30 sec
The first star was quick because the program always counted the outside (i.e. 1 node always). The second star was slow because the program always counted the inside (n-1 nodes always). So you could consider these the best and worst possible test cases.
Alternative strategy
The problem can be solved in \$O(n)\$ time. Instead of picking a random edge and having to count one side, if you picked an edge connected to a leaf node, you automatically know that the count on the leaf side is 1. There is always at least 1 leaf node in the graph, because the graph is a spanning tree with no cycles. If you then remove the leaf node from the graph, there must still be at least 1 leaf node, so you can keep finding a leaf node and removing it until there are no nodes left.
A small complication arises after you find the first leaf node and remove it from the tree. The node it was attached to must now remember that it was connected to the leaf, in order to keep the distance calculations correct. In other words, if leaf A is attached to node B and leaf A is removed, then node B must now count as 2 nodes. If node B is later removed, it must add its 2 nodes to some other node, etc.
So the pseudocode is this:
- Find a node with 1 edge only. Call this node
Aand the node it is attached toB.
- Add to the total distance this amount:
A.count (totalNodes - A.count) edge.weight.
- Add
A.counttoB.count.
- Remove the edge from
AandB.
- Go back to step 1. If
Bonly has 1 edge, useBas the next node.
You can find all the leaf nodes in \$O(n)\$ time because:
- In the beginning there are a fixed number of leaf nodes.
- If you scan the nodes array from beginning to end, you will find these initial leaf nodes in \$O(n)\$ time.
- Any other nodes that become leaf nodes are found in step 5 above and can be handled without scanning.
Similarly, finding and removing edges can be done in amortized \$O(n)\$ time if you handle the edge lists correctly. One key point is that even though you need to search through edge lists, there are only \$n\$ total edges, so that total time spent searching through edge lists is limited to \$O(n)\$.
Sample \$O(n)\$ program
Here is the program I wrote in C with the alternative strategy. It solved all three test cases in 0.01 seconds.
Note: I did edge removal in a somewhat nonstandard way, where I marked the edge invalid instead of removing it from the edge list. I wrote another version where I kept the edges in a doubly linked circular list and I actually removed the edge from the list when needed. Both versions are just as fast, so if anyone wants to see the other version, just ask.
```
#include
#include
#include
const int MOD = 1300031;
typedef struct Edge {
struct Edge *next;
int to;
int weight;
} Edge;
typedef struct Node {
int numConnected;
int numEdges;
Edge *edges;
} Node;
int main(void)
{
int numTests = 0;
Node *nodes = NULL;
Edge *edges = NULL;
scanf("%d", &numTests);
for (int i=0; i 0; j--) {
int from = 0;
int to = 0;
int weight = 0;
scanf("%d %d %d", &from, &to, &weight);
from--;
to--;
edges[e].to = to;
edges[e].next = nodes[from].edges;
edges[e].weight = weight;
nodes[from].edges = &edges[e];
nodes[from].numEdges++;
e++;
edges[e].to = from;
edges[e].next = nodes[to].edges;
edges[e].weight = weight;
nodes[to].edges = &edges[e];
nodes[to].numEdges++;
e++;
}
int search = 0;
int cur = -1;
uint64_t dist = 0;
for (int nodesLeft = numNodes-1; nodesLeft > 0; nodesLeft--) {
Edge *edge;
// Either use the node known to have 1 edge, or find the next
// node that has 1 edge.
if (cur == -1) {
while (nodes[search].numEdges != 1) {
search++;
if (search >= numNodes) {
printf("ERROR, reached end of nodes\n");
exit(1);
}
}
cur = search++;
}
// Find the one valid edge in the list of edges.
for (edg
Code Snippets
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
const int MOD = 1300031;
typedef struct Edge {
struct Edge *next;
int to;
int weight;
} Edge;
typedef struct Node {
int numConnected;
int numEdges;
Edge *edges;
} Node;
int main(void)
{
int numTests = 0;
Node *nodes = NULL;
Edge *edges = NULL;
scanf("%d", &numTests);
for (int i=0; i<numTests; i++) {
int numNodes = 0;
int e = 0;
scanf("%d", &numNodes);
nodes = (Node *) calloc(numNodes, sizeof(*nodes));
edges = (Edge *) calloc(2*numNodes, sizeof(*edges));
// Read in list of edges. For each edge, we create two Edge
// structures and add one to each endpoint Node.
for (int j = numNodes - 1; j > 0; j--) {
int from = 0;
int to = 0;
int weight = 0;
scanf("%d %d %d", &from, &to, &weight);
from--;
to--;
edges[e].to = to;
edges[e].next = nodes[from].edges;
edges[e].weight = weight;
nodes[from].edges = &edges[e];
nodes[from].numEdges++;
e++;
edges[e].to = from;
edges[e].next = nodes[to].edges;
edges[e].weight = weight;
nodes[to].edges = &edges[e];
nodes[to].numEdges++;
e++;
}
int search = 0;
int cur = -1;
uint64_t dist = 0;
for (int nodesLeft = numNodes-1; nodesLeft > 0; nodesLeft--) {
Edge *edge;
// Either use the node known to have 1 edge, or find the next
// node that has 1 edge.
if (cur == -1) {
while (nodes[search].numEdges != 1) {
search++;
if (search >= numNodes) {
printf("ERROR, reached end of nodes\n");
exit(1);
}
}
cur = search++;
}
// Find the one valid edge in the list of edges.
for (edge = nodes[cur].edges; edge != NULL; edge = edge->next) {
if (edge->to != -1)
break;
}
if (edge == NULL) {
printf("ERROR, reached end of edges\n");
exit(1);
}
// The edge is going from cur <-> to, where cur is a leaf node.
// We can calculate the distance used by the edge because we know
// how many nodes are attached to cur. That number is 1 for cur
// itself plus nodes[cur].numConnected which is the number of
// previous leaf nodes that we removed that are connected to cur.
int to = edge->to;
int curNodes = nodes[cur].numConnected + 1;
dist += (numNodes - curNodes) * curNodes * edge->weight;
Context
StackExchange Code Review Q#135915, answer score: 12
Revisions (0)
No revisions yet.