HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

PL/pgSQL function or rCTE to detect depth of relations between two tables

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
depthtablesrelationsfunctionrctebetweentwopgsqldetect

Problem

I need to perform a PostgreSQL Function for checking the depth of relations. There is a circular binding from table a to b, back to another element in a and, in some cases back again to b. These relation is possible several times. I try to build a function that request these relations.

I tried to build a simply SQL query but it breaks with memory overflow error, building a result table with more then 100 million rows. When checking the full table with more then 10 million rows. So I try a FOR loop which refuse storage after each loop.

Problem is now the loop runs to the first element which counts to relation 1 but not further. All further relations got the relation count 0, and bstid remains at a single value.

I put some RETURN NEXT statements to get debugging controls. Is there something wrong with the nested IF-ELSE statements?

```
CREATE OR REPLACE FUNCTION bst_ebenen() RETURNS SETOF varchar(1000) AS
$BODY$
DECLARE
--an_array varchar[bstobjid][id];
bstobjid varchar;
loopid bigint;
bstid bigint;
relation varchar;
searchsql text := '';
ebenen integer := 0;
message varchar := '';

BEGIN
searchsql = 'SELECT objid AS bstobjid FROM buchstelle';

FOR bstobjid IN EXECUTE(searchsql) LOOP

ebenen = 0;

--1. Relationsebene abfragen
loopid = (SELECT bst.id FROM buchstelle bst WHERE bst.objid = bstobjid);

IF loopid NOT IN (SELECT rid FROM buchstelle__relation LIMIT 1)
THEN ebenen = 0;
ELSE
ebenen = 1;

relation = (SELECT rel.relation FROM buchstelle__relation rel
WHERE rel.rid = loopid LIMIT 1);

RETURN NEXT relation;

--2. Relationsebene abfragen

bstid = (SELECT bst.id FROM buchstelle bst WHERE bst.objid = relation LIMIT 1);
RETURN NEXT bstid;

IF bstid NOT IN (SELECT rid FROM buchstelle__relation LIMIT 1)
THEN ebenen = ebenen;

Solution

A recursive CTE seems the way to go.

Assuming your path has no cycles. Else it needs more work to detect cycles. The array solution below can readily be adapted.
Test setup

Building on this simplified layout:

CREATE TABLE t1 (t1_id int, objid text);
INSERT INTO t1 VALUES
 (1,'aaa')
,(2,'bbb')
,(3,'ccc')
,(4,'ddd')
,(5,'eee')
,(6,'fff')
,(7,'ggg')
,(8,'hhh');

CREATE TABLE t2 (t2_id int, t1_id int, objid text);
INSERT INTO t2 VALUES
 (1,3,'aaa')
,(2,4,'aaa')
,(3,7,'hhh')
,(4,8,'ccc');


Two different solutions:
Solution with string as path

WITH RECURSIVE cte AS (
   SELECT t.t1_id AS start_id
        , t2.t2_id::text || '(t2)' || COALESCE(' ->' || t1.t1_id || '(t1)', '') AS path
        , t1.t1_id
   FROM   t1 t
   LEFT   JOIN t2 USING (t1_id)
   LEFT   JOIN t1 ON t1.objid = t2.objid

   UNION ALL
   SELECT c.start_id
        , c.path || ' ->' || t2.t2_id || '(t2)' || COALESCE(' ->' || t1.t1_id || '(t1)', '')
        , t1.t1_id
   FROM   cte c
   JOIN   t2      USING (t1_id)
   LEFT   JOIN t1 USING (objid)
   )
SELECT DISTINCT ON (start_id)
       start_id, path
FROM   cte
ORDER  BY start_id, path DESC;


Result:

start_id   path
1          
2          
3          1(t2) ->1(t1)
4          2(t2) ->1(t1)
5          
6          
7          3(t2) ->8(t1) ->4(t2) ->3(t1) ->1(t2) ->1(t1)
8          4(t2) ->3(t1) ->1(t2) ->1(t1)


Table names are redundant, obviously. I added them for good looks.
Solution with inverted array

Rightmost element is first t2_id, keep alternating from right to left.

WITH RECURSIVE cte AS (
   SELECT t.t1_id AS start_id, ARRAY[t1.t1_id, t2.t2_id] AS path
   FROM   t1 t
   LEFT   JOIN t2 USING (t1_id)
   LEFT   JOIN t1 ON t1.objid = t2.objid

   UNION ALL
   SELECT c.start_id, t1.t1_id || (t2.t2_id || path)
   FROM   cte c
   JOIN   t2 ON t2.t1_id = path[1]
   LEFT   JOIN t1 USING (objid)
   )
SELECT DISTINCT ON (start_id)
       start_id, array_remove(path, NULL) AS path
FROM   cte
ORDER  BY start_id, array_length(path, 1) DESC;


Result:

start_id   path
1          {}
2          {}
3          {1,1}
4          {1,2}
5          {}
6          {}
7          {1,1,3,4,8,3}
8          {1,1,3,4}


array_remove() requires Postgres 9.3+.

Inverted the array to need one less columns. By putting the last element first, I can reference path[1] in the next step. Not sure if that's cheaper, would need a test ...

Shorter code, but probably slower. I suspect array handling is more expensive. Easier to adapt if we need to observe cycles.

SQL Fiddle.
Major points

-
We are alternating between two tables.

To make this recursive, one step needs to cover two hops (from t1 -> t2 and back t2 -> t1).

-
The initial SELECT uses 2x LEFT JOIN to include all rows like in your example result.

The recursive part JOIN to stop the loop when no match is found. The hop back uses LEFT JOIN again.

Code Snippets

CREATE TABLE t1 (t1_id int, objid text);
INSERT INTO t1 VALUES
 (1,'aaa')
,(2,'bbb')
,(3,'ccc')
,(4,'ddd')
,(5,'eee')
,(6,'fff')
,(7,'ggg')
,(8,'hhh');

CREATE TABLE t2 (t2_id int, t1_id int, objid text);
INSERT INTO t2 VALUES
 (1,3,'aaa')
,(2,4,'aaa')
,(3,7,'hhh')
,(4,8,'ccc');
WITH RECURSIVE cte AS (
   SELECT t.t1_id AS start_id
        , t2.t2_id::text || '(t2)' || COALESCE(' ->' || t1.t1_id || '(t1)', '') AS path
        , t1.t1_id
   FROM   t1 t
   LEFT   JOIN t2 USING (t1_id)
   LEFT   JOIN t1 ON t1.objid = t2.objid

   UNION ALL
   SELECT c.start_id
        , c.path || ' ->' || t2.t2_id || '(t2)' || COALESCE(' ->' || t1.t1_id || '(t1)', '')
        , t1.t1_id
   FROM   cte c
   JOIN   t2      USING (t1_id)
   LEFT   JOIN t1 USING (objid)
   )
SELECT DISTINCT ON (start_id)
       start_id, path
FROM   cte
ORDER  BY start_id, path DESC;
start_id   path
1          
2          
3          1(t2) ->1(t1)
4          2(t2) ->1(t1)
5          
6          
7          3(t2) ->8(t1) ->4(t2) ->3(t1) ->1(t2) ->1(t1)
8          4(t2) ->3(t1) ->1(t2) ->1(t1)
WITH RECURSIVE cte AS (
   SELECT t.t1_id AS start_id, ARRAY[t1.t1_id, t2.t2_id] AS path
   FROM   t1 t
   LEFT   JOIN t2 USING (t1_id)
   LEFT   JOIN t1 ON t1.objid = t2.objid

   UNION ALL
   SELECT c.start_id, t1.t1_id || (t2.t2_id || path)
   FROM   cte c
   JOIN   t2 ON t2.t1_id = path[1]
   LEFT   JOIN t1 USING (objid)
   )
SELECT DISTINCT ON (start_id)
       start_id, array_remove(path, NULL) AS path
FROM   cte
ORDER  BY start_id, array_length(path, 1) DESC;
start_id   path
1          {}
2          {}
3          {1,1}
4          {1,2}
5          {}
6          {}
7          {1,1,3,4,8,3}
8          {1,1,3,4}

Context

StackExchange Database Administrators Q#72644, answer score: 2

Revisions (0)

No revisions yet.