patternsqlMinor
PL/pgSQL function or rCTE to detect depth of relations between two tables
Viewed 0 times
depthtablesrelationsfunctionrctebetweentwopgsqldetect
Problem
I need to perform a PostgreSQL Function for checking the depth of relations. There is a circular binding from table a to b, back to another element in a and, in some cases back again to b. These relation is possible several times. I try to build a function that request these relations.
I tried to build a simply SQL query but it breaks with memory overflow error, building a result table with more then 100 million rows. When checking the full table with more then 10 million rows. So I try a FOR loop which refuse storage after each loop.
Problem is now the loop runs to the first element which counts to relation 1 but not further. All further relations got the relation count 0, and bstid remains at a single value.
I put some RETURN NEXT statements to get debugging controls. Is there something wrong with the nested IF-ELSE statements?
```
CREATE OR REPLACE FUNCTION bst_ebenen() RETURNS SETOF varchar(1000) AS
$BODY$
DECLARE
--an_array varchar[bstobjid][id];
bstobjid varchar;
loopid bigint;
bstid bigint;
relation varchar;
searchsql text := '';
ebenen integer := 0;
message varchar := '';
BEGIN
searchsql = 'SELECT objid AS bstobjid FROM buchstelle';
FOR bstobjid IN EXECUTE(searchsql) LOOP
ebenen = 0;
--1. Relationsebene abfragen
loopid = (SELECT bst.id FROM buchstelle bst WHERE bst.objid = bstobjid);
IF loopid NOT IN (SELECT rid FROM buchstelle__relation LIMIT 1)
THEN ebenen = 0;
ELSE
ebenen = 1;
relation = (SELECT rel.relation FROM buchstelle__relation rel
WHERE rel.rid = loopid LIMIT 1);
RETURN NEXT relation;
--2. Relationsebene abfragen
bstid = (SELECT bst.id FROM buchstelle bst WHERE bst.objid = relation LIMIT 1);
RETURN NEXT bstid;
IF bstid NOT IN (SELECT rid FROM buchstelle__relation LIMIT 1)
THEN ebenen = ebenen;
I tried to build a simply SQL query but it breaks with memory overflow error, building a result table with more then 100 million rows. When checking the full table with more then 10 million rows. So I try a FOR loop which refuse storage after each loop.
Problem is now the loop runs to the first element which counts to relation 1 but not further. All further relations got the relation count 0, and bstid remains at a single value.
I put some RETURN NEXT statements to get debugging controls. Is there something wrong with the nested IF-ELSE statements?
```
CREATE OR REPLACE FUNCTION bst_ebenen() RETURNS SETOF varchar(1000) AS
$BODY$
DECLARE
--an_array varchar[bstobjid][id];
bstobjid varchar;
loopid bigint;
bstid bigint;
relation varchar;
searchsql text := '';
ebenen integer := 0;
message varchar := '';
BEGIN
searchsql = 'SELECT objid AS bstobjid FROM buchstelle';
FOR bstobjid IN EXECUTE(searchsql) LOOP
ebenen = 0;
--1. Relationsebene abfragen
loopid = (SELECT bst.id FROM buchstelle bst WHERE bst.objid = bstobjid);
IF loopid NOT IN (SELECT rid FROM buchstelle__relation LIMIT 1)
THEN ebenen = 0;
ELSE
ebenen = 1;
relation = (SELECT rel.relation FROM buchstelle__relation rel
WHERE rel.rid = loopid LIMIT 1);
RETURN NEXT relation;
--2. Relationsebene abfragen
bstid = (SELECT bst.id FROM buchstelle bst WHERE bst.objid = relation LIMIT 1);
RETURN NEXT bstid;
IF bstid NOT IN (SELECT rid FROM buchstelle__relation LIMIT 1)
THEN ebenen = ebenen;
Solution
A recursive CTE seems the way to go.
Assuming your path has no cycles. Else it needs more work to detect cycles. The array solution below can readily be adapted.
Test setup
Building on this simplified layout:
Two different solutions:
Solution with string as path
Result:
Table names are redundant, obviously. I added them for good looks.
Solution with inverted array
Rightmost element is first
Result:
Inverted the array to need one less columns. By putting the last element first, I can reference
Shorter code, but probably slower. I suspect array handling is more expensive. Easier to adapt if we need to observe cycles.
SQL Fiddle.
Major points
-
We are alternating between two tables.
To make this recursive, one step needs to cover two hops (from
-
The initial
The recursive part
Assuming your path has no cycles. Else it needs more work to detect cycles. The array solution below can readily be adapted.
Test setup
Building on this simplified layout:
CREATE TABLE t1 (t1_id int, objid text);
INSERT INTO t1 VALUES
(1,'aaa')
,(2,'bbb')
,(3,'ccc')
,(4,'ddd')
,(5,'eee')
,(6,'fff')
,(7,'ggg')
,(8,'hhh');
CREATE TABLE t2 (t2_id int, t1_id int, objid text);
INSERT INTO t2 VALUES
(1,3,'aaa')
,(2,4,'aaa')
,(3,7,'hhh')
,(4,8,'ccc');Two different solutions:
Solution with string as path
WITH RECURSIVE cte AS (
SELECT t.t1_id AS start_id
, t2.t2_id::text || '(t2)' || COALESCE(' ->' || t1.t1_id || '(t1)', '') AS path
, t1.t1_id
FROM t1 t
LEFT JOIN t2 USING (t1_id)
LEFT JOIN t1 ON t1.objid = t2.objid
UNION ALL
SELECT c.start_id
, c.path || ' ->' || t2.t2_id || '(t2)' || COALESCE(' ->' || t1.t1_id || '(t1)', '')
, t1.t1_id
FROM cte c
JOIN t2 USING (t1_id)
LEFT JOIN t1 USING (objid)
)
SELECT DISTINCT ON (start_id)
start_id, path
FROM cte
ORDER BY start_id, path DESC;Result:
start_id path
1
2
3 1(t2) ->1(t1)
4 2(t2) ->1(t1)
5
6
7 3(t2) ->8(t1) ->4(t2) ->3(t1) ->1(t2) ->1(t1)
8 4(t2) ->3(t1) ->1(t2) ->1(t1)Table names are redundant, obviously. I added them for good looks.
Solution with inverted array
Rightmost element is first
t2_id, keep alternating from right to left.WITH RECURSIVE cte AS (
SELECT t.t1_id AS start_id, ARRAY[t1.t1_id, t2.t2_id] AS path
FROM t1 t
LEFT JOIN t2 USING (t1_id)
LEFT JOIN t1 ON t1.objid = t2.objid
UNION ALL
SELECT c.start_id, t1.t1_id || (t2.t2_id || path)
FROM cte c
JOIN t2 ON t2.t1_id = path[1]
LEFT JOIN t1 USING (objid)
)
SELECT DISTINCT ON (start_id)
start_id, array_remove(path, NULL) AS path
FROM cte
ORDER BY start_id, array_length(path, 1) DESC;Result:
start_id path
1 {}
2 {}
3 {1,1}
4 {1,2}
5 {}
6 {}
7 {1,1,3,4,8,3}
8 {1,1,3,4}array_remove() requires Postgres 9.3+.Inverted the array to need one less columns. By putting the last element first, I can reference
path[1] in the next step. Not sure if that's cheaper, would need a test ...Shorter code, but probably slower. I suspect array handling is more expensive. Easier to adapt if we need to observe cycles.
SQL Fiddle.
Major points
-
We are alternating between two tables.
To make this recursive, one step needs to cover two hops (from
t1 -> t2 and back t2 -> t1).-
The initial
SELECT uses 2x LEFT JOIN to include all rows like in your example result.The recursive part
JOIN to stop the loop when no match is found. The hop back uses LEFT JOIN again.Code Snippets
CREATE TABLE t1 (t1_id int, objid text);
INSERT INTO t1 VALUES
(1,'aaa')
,(2,'bbb')
,(3,'ccc')
,(4,'ddd')
,(5,'eee')
,(6,'fff')
,(7,'ggg')
,(8,'hhh');
CREATE TABLE t2 (t2_id int, t1_id int, objid text);
INSERT INTO t2 VALUES
(1,3,'aaa')
,(2,4,'aaa')
,(3,7,'hhh')
,(4,8,'ccc');WITH RECURSIVE cte AS (
SELECT t.t1_id AS start_id
, t2.t2_id::text || '(t2)' || COALESCE(' ->' || t1.t1_id || '(t1)', '') AS path
, t1.t1_id
FROM t1 t
LEFT JOIN t2 USING (t1_id)
LEFT JOIN t1 ON t1.objid = t2.objid
UNION ALL
SELECT c.start_id
, c.path || ' ->' || t2.t2_id || '(t2)' || COALESCE(' ->' || t1.t1_id || '(t1)', '')
, t1.t1_id
FROM cte c
JOIN t2 USING (t1_id)
LEFT JOIN t1 USING (objid)
)
SELECT DISTINCT ON (start_id)
start_id, path
FROM cte
ORDER BY start_id, path DESC;start_id path
1
2
3 1(t2) ->1(t1)
4 2(t2) ->1(t1)
5
6
7 3(t2) ->8(t1) ->4(t2) ->3(t1) ->1(t2) ->1(t1)
8 4(t2) ->3(t1) ->1(t2) ->1(t1)WITH RECURSIVE cte AS (
SELECT t.t1_id AS start_id, ARRAY[t1.t1_id, t2.t2_id] AS path
FROM t1 t
LEFT JOIN t2 USING (t1_id)
LEFT JOIN t1 ON t1.objid = t2.objid
UNION ALL
SELECT c.start_id, t1.t1_id || (t2.t2_id || path)
FROM cte c
JOIN t2 ON t2.t1_id = path[1]
LEFT JOIN t1 USING (objid)
)
SELECT DISTINCT ON (start_id)
start_id, array_remove(path, NULL) AS path
FROM cte
ORDER BY start_id, array_length(path, 1) DESC;start_id path
1 {}
2 {}
3 {1,1}
4 {1,2}
5 {}
6 {}
7 {1,1,3,4,8,3}
8 {1,1,3,4}Context
StackExchange Database Administrators Q#72644, answer score: 2
Revisions (0)
No revisions yet.