patternsqlMinor
Determine Nodes in Network with PostgreSQL
Viewed 0 times
postgresqlnodeswithdeterminenetwork
Problem
I have a table where every entry is a node and the table contains the direct connections of each node to other nodes. I am looking to create a view with a column for each node containing all the nodes in the chain, not just the nodes the node itself is connected to.
An example would be generating the Nodes in Chain column from the first two columns of the following table:
That is a small simplified version of the real problem. If I can solve the example, the full table should be no problem.
The data of this table can be visualized in the following way:
I have looked into several different methods to solve this problem. I have looked into recursive CTEs but I have not managed to make them work.
Each node is connected to itself currently in the database. Not a problem to remove the connection to itself in the database if that is necessary.
Probably unnecessary background to the problem:
The origin of this problem comes from attempting to identify vehicles in traffic. The original database contains a vehicles location and speed at every time step t in a given area. The goal is to determine the time spent at a traffic light. To solve this problem a stopping area for the traffic light was identified. Each vehicle in this area with a speed below a certain threshold is considered to be waiting for the traffic light. Due to a long queue line, vehicles may however be queuing outside this area.
An example would be generating the Nodes in Chain column from the first two columns of the following table:
CREATE TABLE example
(
node text,
connections text[],
nodes_in_chain text[]
)
INSERT INTO example VALUES
('a', ARRAY['a','b'], null),
('b', ARRAY['a','b','c','d'], null),
('c', ARRAY['b','c'], null),
('d', ARRAY['b','d'], null),
('e', ARRAY['e','f'], null),
('f', ARRAY['e','f'], null);Node Connections Nodes in Chain
"a" "{a,b}" "{a,b,c,d}"
"b" "{a,b,c,d}" "{a,b,c,d}"
"c" "{b,c}" "{a,b,c,d}"
"d" "{b,d}" "{a,b,c,d}"
"e" "{e,f}" "{e,f}"
"f" "{e,f}" "{e,f}"
That is a small simplified version of the real problem. If I can solve the example, the full table should be no problem.
The data of this table can be visualized in the following way:
I have looked into several different methods to solve this problem. I have looked into recursive CTEs but I have not managed to make them work.
Each node is connected to itself currently in the database. Not a problem to remove the connection to itself in the database if that is necessary.
Probably unnecessary background to the problem:
The origin of this problem comes from attempting to identify vehicles in traffic. The original database contains a vehicles location and speed at every time step t in a given area. The goal is to determine the time spent at a traffic light. To solve this problem a stopping area for the traffic light was identified. Each vehicle in this area with a speed below a certain threshold is considered to be waiting for the traffic light. Due to a long queue line, vehicles may however be queuing outside this area.
Solution
Approach 1:
At first sight, it seems that I could apply a basic solution because, according to your sample data, each single connection is included in another connection.
node | connections | nodes_in_chain
:--- | :---------- | :-------------
a | {a,b} | {a,b,c,d}
b | {a,b,c,d} | {a,b,c,d}
c | {b,c} | {a,b,c,d}
d | {b,d} | {a,b,c,d}
e | {e,f} | {e,f}
f | {e,f} | {e,f}
Approach 2:
But, as @ypercube pointed out, this solution doesn't work if there are 3 or more linear points in a row.
Ex: e -> f -> g -> h
As a reference to solve this question I used the answers in another related question:
It uses a method called transitive closure to solve the problem.
Transitive closure
In mathematics, the transitive closure of a binary relation R on a set X is the smallest relation on X that contains R and is transitive.
For example, if X is a set of airports and xRy means "there is a direct flight from airport x to airport y" (for x and y in X), then the transitive closure of R on X is the relation R+ such that x R+ y means "it is possible to fly from x to y in one or more flights". Informally, the transitive closure gives you the set of all places you can get to from any starting place.
First let me change your sample data by adding a linear connection of 4 nodes.
Now applying the math:
Give us this result:
db<>fiddle here
NOTE: The original relation has only the immediate connections, which can be seen as paths of length 1 (2 nodes each). Approach 1 finds all paths of length 2 (3 nodes) as it applies a method of connecting once. To find paths of length N, you have to apply the methods N-1 times. To find all paths of arbitrary lengths (alas the transitive closure), you need a recursive solution, or a while loop.
It can't be done with simple SQL. (ie: one query without CTEs.)
@ypercube
At first sight, it seems that I could apply a basic solution because, according to your sample data, each single connection is included in another connection.
SELECT
e1.node,
e1.connections,
COALESCE(e2.connections, e1.connections) nodes_in_chain
FROM
example e1
LEFT JOIN
example e2
ON e2.node <> e1.node
AND e1.connections <@ e2.connections;node | connections | nodes_in_chain
:--- | :---------- | :-------------
a | {a,b} | {a,b,c,d}
b | {a,b,c,d} | {a,b,c,d}
c | {b,c} | {a,b,c,d}
d | {b,d} | {a,b,c,d}
e | {e,f} | {e,f}
f | {e,f} | {e,f}
Approach 2:
But, as @ypercube pointed out, this solution doesn't work if there are 3 or more linear points in a row.
Ex: e -> f -> g -> h
As a reference to solve this question I used the answers in another related question:
- Grouping on any one of multiple columns in Postgres
It uses a method called transitive closure to solve the problem.
Transitive closure
In mathematics, the transitive closure of a binary relation R on a set X is the smallest relation on X that contains R and is transitive.
For example, if X is a set of airports and xRy means "there is a direct flight from airport x to airport y" (for x and y in X), then the transitive closure of R on X is the relation R+ such that x R+ y means "it is possible to fly from x to y in one or more flights". Informally, the transitive closure gives you the set of all places you can get to from any starting place.
First let me change your sample data by adding a linear connection of 4 nodes.
DELETE FROM example WHERE node = 'f';
INSERT INTO example VALUES
('f', ARRAY['e','f','g'], null),
('g', ARRAY['f','g','h'], null),
('h', ARRAY['g','h'], null);Now applying the math:
WITH RECURSIVE al (dst, src) AS --adjacent list or list of all related nodes
(
SELECT e1.node, e2.node
FROM example e1
JOIN example e2
ON e1.node = any(e2.connections)
), tc (dst, src) AS
(
SELECT dst, src FROM al -- transitive closure
UNION
SELECT a1.dst, a2.src
FROM al as a1
JOIN tc as a2
ON a1.src = a2.dst
)
SELECT src, array_agg(DISTINCT dst ORDER BY dst) AS nodes_in_chain
FROM tc
GROUP BY src;Give us this result:
src | nodes_in_chain
:-- | :-------------
a | {a,b,c,d}
b | {a,b,c,d}
c | {a,b,c,d}
d | {a,b,c,d}
e | {e,f,g,h}
f | {e,f,g,h}
g | {e,f,g,h}
h | {e,f,g,h}db<>fiddle here
NOTE: The original relation has only the immediate connections, which can be seen as paths of length 1 (2 nodes each). Approach 1 finds all paths of length 2 (3 nodes) as it applies a method of connecting once. To find paths of length N, you have to apply the methods N-1 times. To find all paths of arbitrary lengths (alas the transitive closure), you need a recursive solution, or a while loop.
It can't be done with simple SQL. (ie: one query without CTEs.)
@ypercube
Code Snippets
SELECT
e1.node,
e1.connections,
COALESCE(e2.connections, e1.connections) nodes_in_chain
FROM
example e1
LEFT JOIN
example e2
ON e2.node <> e1.node
AND e1.connections <@ e2.connections;DELETE FROM example WHERE node = 'f';
INSERT INTO example VALUES
('f', ARRAY['e','f','g'], null),
('g', ARRAY['f','g','h'], null),
('h', ARRAY['g','h'], null);WITH RECURSIVE al (dst, src) AS --adjacent list or list of all related nodes
(
SELECT e1.node, e2.node
FROM example e1
JOIN example e2
ON e1.node = any(e2.connections)
), tc (dst, src) AS
(
SELECT dst, src FROM al -- transitive closure
UNION
SELECT a1.dst, a2.src
FROM al as a1
JOIN tc as a2
ON a1.src = a2.dst
)
SELECT src, array_agg(DISTINCT dst ORDER BY dst) AS nodes_in_chain
FROM tc
GROUP BY src;src | nodes_in_chain
:-- | :-------------
a | {a,b,c,d}
b | {a,b,c,d}
c | {a,b,c,d}
d | {a,b,c,d}
e | {e,f,g,h}
f | {e,f,g,h}
g | {e,f,g,h}
h | {e,f,g,h}Context
StackExchange Database Administrators Q#278567, answer score: 6
Revisions (0)
No revisions yet.