HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Determine Nodes in Network with PostgreSQL

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlnodeswithdeterminenetwork

Problem

I have a table where every entry is a node and the table contains the direct connections of each node to other nodes. I am looking to create a view with a column for each node containing all the nodes in the chain, not just the nodes the node itself is connected to.

An example would be generating the Nodes in Chain column from the first two columns of the following table:

CREATE TABLE example
(
    node text,
    connections text[],
    nodes_in_chain text[]
)

INSERT INTO example VALUES
('a', ARRAY['a','b'],         null),
('b', ARRAY['a','b','c','d'], null),
('c', ARRAY['b','c'],         null),
('d', ARRAY['b','d'],         null),
('e', ARRAY['e','f'],         null),
('f', ARRAY['e','f'],         null);


Node Connections Nodes in Chain
"a" "{a,b}" "{a,b,c,d}"
"b" "{a,b,c,d}" "{a,b,c,d}"
"c" "{b,c}" "{a,b,c,d}"
"d" "{b,d}" "{a,b,c,d}"
"e" "{e,f}" "{e,f}"
"f" "{e,f}" "{e,f}"


That is a small simplified version of the real problem. If I can solve the example, the full table should be no problem.

The data of this table can be visualized in the following way:

I have looked into several different methods to solve this problem. I have looked into recursive CTEs but I have not managed to make them work.

Each node is connected to itself currently in the database. Not a problem to remove the connection to itself in the database if that is necessary.

Probably unnecessary background to the problem:

The origin of this problem comes from attempting to identify vehicles in traffic. The original database contains a vehicles location and speed at every time step t in a given area. The goal is to determine the time spent at a traffic light. To solve this problem a stopping area for the traffic light was identified. Each vehicle in this area with a speed below a certain threshold is considered to be waiting for the traffic light. Due to a long queue line, vehicles may however be queuing outside this area.

Solution

Approach 1:

At first sight, it seems that I could apply a basic solution because, according to your sample data, each single connection is included in another connection.

SELECT 
    e1.node, 
    e1.connections, 
    COALESCE(e2.connections, e1.connections) nodes_in_chain
FROM
    example e1
LEFT JOIN 
    example e2
    ON e2.node <> e1.node
    AND e1.connections <@ e2.connections;


node | connections | nodes_in_chain
:--- | :---------- | :-------------
a | {a,b} | {a,b,c,d}
b | {a,b,c,d} | {a,b,c,d}
c | {b,c} | {a,b,c,d}
d | {b,d} | {a,b,c,d}
e | {e,f} | {e,f}
f | {e,f} | {e,f}

Approach 2:

But, as @ypercube pointed out, this solution doesn't work if there are 3 or more linear points in a row.

Ex: e -> f -> g -> h

As a reference to solve this question I used the answers in another related question:

  • Grouping on any one of multiple columns in Postgres



It uses a method called transitive closure to solve the problem.

Transitive closure

In mathematics, the transitive closure of a binary relation R on a set X is the smallest relation on X that contains R and is transitive.

For example, if X is a set of airports and xRy means "there is a direct flight from airport x to airport y" (for x and y in X), then the transitive closure of R on X is the relation R+ such that x R+ y means "it is possible to fly from x to y in one or more flights". Informally, the transitive closure gives you the set of all places you can get to from any starting place.

First let me change your sample data by adding a linear connection of 4 nodes.

DELETE FROM example WHERE node = 'f';
INSERT INTO example VALUES
('f', ARRAY['e','f','g'],     null),
('g', ARRAY['f','g','h'],     null),
('h', ARRAY['g','h'],         null);


Now applying the math:

WITH RECURSIVE al (dst, src) AS --adjacent list or list of all related nodes
(
  SELECT e1.node, e2.node
  FROM   example e1
  JOIN   example e2
         ON e1.node = any(e2.connections)   
), tc (dst, src) AS
   (
     SELECT dst, src FROM al -- transitive closure
     UNION
     SELECT a1.dst, a2.src
     FROM   al as a1 
     JOIN   tc as a2 
            ON a1.src = a2.dst
   )
   SELECT   src, array_agg(DISTINCT dst ORDER BY dst) AS nodes_in_chain
   FROM     tc
   GROUP BY src;


Give us this result:

src | nodes_in_chain
:-- | :-------------
a   | {a,b,c,d}     
b   | {a,b,c,d}     
c   | {a,b,c,d}     
d   | {a,b,c,d}     
e   | {e,f,g,h}     
f   | {e,f,g,h}     
g   | {e,f,g,h}     
h   | {e,f,g,h}


db<>fiddle here

NOTE: The original relation has only the immediate connections, which can be seen as paths of length 1 (2 nodes each). Approach 1 finds all paths of length 2 (3 nodes) as it applies a method of connecting once. To find paths of length N, you have to apply the methods N-1 times. To find all paths of arbitrary lengths (alas the transitive closure), you need a recursive solution, or a while loop.
It can't be done with simple SQL. (ie: one query without CTEs.)

@ypercube

Code Snippets

SELECT 
    e1.node, 
    e1.connections, 
    COALESCE(e2.connections, e1.connections) nodes_in_chain
FROM
    example e1
LEFT JOIN 
    example e2
    ON e2.node <> e1.node
    AND e1.connections <@ e2.connections;
DELETE FROM example WHERE node = 'f';
INSERT INTO example VALUES
('f', ARRAY['e','f','g'],     null),
('g', ARRAY['f','g','h'],     null),
('h', ARRAY['g','h'],         null);
WITH RECURSIVE al (dst, src) AS --adjacent list or list of all related nodes
(
  SELECT e1.node, e2.node
  FROM   example e1
  JOIN   example e2
         ON e1.node = any(e2.connections)   
), tc (dst, src) AS
   (
     SELECT dst, src FROM al -- transitive closure
     UNION
     SELECT a1.dst, a2.src
     FROM   al as a1 
     JOIN   tc as a2 
            ON a1.src = a2.dst
   )
   SELECT   src, array_agg(DISTINCT dst ORDER BY dst) AS nodes_in_chain
   FROM     tc
   GROUP BY src;
src | nodes_in_chain
:-- | :-------------
a   | {a,b,c,d}     
b   | {a,b,c,d}     
c   | {a,b,c,d}     
d   | {a,b,c,d}     
e   | {e,f,g,h}     
f   | {e,f,g,h}     
g   | {e,f,g,h}     
h   | {e,f,g,h}

Context

StackExchange Database Administrators Q#278567, answer score: 6

Revisions (0)

No revisions yet.