HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Recursively find the path to all leaves descending from a given parent in a tree

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
paththealldescendingparentleavesrecursivelyfindfromgiven

Problem

I'm trying to write a query to identify the path to each furthest descendant of a particular parent node in a table of tree structures like:

0     1
    |     |
    2     3
    |     |
    4     5
   / \    |
 *6*  8  *7*
      |
     *9*


There are many parents, all children have one parent, parents have 0-5 children (but the graph is very "long" – most parents only have one child). There are no cycles.

I'm trying to efficiently identify the path to the furthest descendants of a specific node (and not to any intermediate nodes). E.g. in the above:

  • get_leaf_paths(1) would return 1 row: {1, 3, 5, 7}



  • get_leaf_paths(2) would return 2 rows: {2, 4, 6} and {2, 4, 8, 9}



Sample table:

CREATE TABLE graph (
    id bigint PRIMARY KEY,
    parent_id bigint,
    FOREIGN KEY (parent_id) REFERENCES graph(id)
);
INSERT INTO graph (id, parent_id)
    VALUES (0, NULL),
           (1, NULL),
           (2, 0),
           (3, 1),
           (4, 2),
           (5, 3),
           (6, 4),
           (7, 5),
           (8, 4),
           (9, 8);


I'm hoping for a result that looks something like:

SELECT get_leaf_paths.* FROM get_leaf_paths(0);
path
-----
{0, 2, 4, 6}
{0, 2, 4, 8, 9}
(2 rows)


In my initial attempt at a function with a recursive query, I've had trouble selecting only the furthest leaves, especially since some branches are shorter than others (e.g. 6 and 9 above are at different depths). The paths can be very deep (thousands or millions of elements), so I would also like to avoid the excessive memory usage of generating paths for every intermediate node.

Any ideas would be greatly appreciated. Thanks!

Solution

WITH RECURSIVE
cte AS ( SELECT id,
parent_id,
id::TEXT path,
NOT EXISTS ( SELECT NULL
FROM graph gr
WHERE graph.id = gr.parent_id ) is_leaf
FROM graph
WHERE id = 2 / initital node id /
UNION ALL
SELECT graph.id,
graph.parent_id,
cte.path || ',' || graph.id,
NOT EXISTS ( SELECT NULL
FROM graph gr
WHERE graph.id = gr.parent_id )
FROM cte JOIN graph ON cte.id = graph.parent_id)
SELECT path
FROM cte
WHERE is_leaf


https://dbfiddle.uk/?rdbms=postgres_12&fiddle=2e40ff454f302033bf5e8cba8b0b0d85

Multiple initial nodes may be applied too (WHERE id IN (0, 1)).

Context

StackExchange Database Administrators Q#289515, answer score: 2

Revisions (0)

No revisions yet.