HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

PostgreSQL tree structure and recursive CTE optimization

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlcteoptimizationrecursivestructureandtree

Problem

I'm trying to represent a tree structure in PostgreSQL (8.4) to be able to query the path from the root to a given node or to find all the nodes within a sub-branch.

Here is a test table:

CREATE TABLE tree_data_1 (
    forest_id TEXT NOT NULL,
    node_id TEXT NOT NULL,
    parent_id TEXT,
    node_type TEXT,
    description TEXT,
    PRIMARY KEY (forest_id, node_id),
    FOREIGN KEY (forest_id, parent_id) REFERENCES tree_data_1 (forest_id, node_id)
);
CREATE INDEX tree_data_1_forestid_parent_idx ON tree_data_1(forest_id, parent_id);
CREATE INDEX tree_data_1_forestid_idx ON tree_data_1(forest_id);
CREATE INDEX tree_data_1_nodeid_idx ON tree_data_1(node_id);
CREATE INDEX tree_data_1_parent_idx ON tree_data_1(parent_id);


Each node is identified by (forest_id, node_id) (there can be another node with the same name in another forest). Each tree starts at a root node (where parent_id is null), although I'm only expecting one per forest.

Here is the view that uses a recursive CTE:

CREATE OR REPLACE VIEW tree_view_1 AS
    WITH RECURSIVE rec_sub_tree(forest_id, node_id, parent_id, depth, path, cycle) AS (
        SELECT td.forest_id, td.node_id, td.parent_id, 0, ARRAY[td.node_id], FALSE FROM tree_data_1 td
        UNION ALL
        SELECT td.forest_id, rec.node_id, td.parent_id, rec.depth+1, td.node_id || rec.path, td.node_id = ANY(rec.path)
            FROM tree_data_1 td, rec_sub_tree rec
            WHERE td.forest_id = rec.forest_id AND rec.parent_id = td.node_id AND NOT cycle
     )
     SELECT forest_id, node_id, parent_id, depth, path
         FROM rec_sub_tree;


This is a slightly modified version of the example in the documentation, to take into account the forest_id, and that returns rec.node_id in the recursive SELECT instead of what would be td.node_id.

The get the path from the root to a given node, this query can be used:

SELECT * FROM tree_view_1 WHERE forest_id='Forest A' AND node_id='...' AND parent_id IS NULL


Th

Solution

If you really have to modify these data rarely, then you can simply store the result of the CTE in a table, and run queries against this table. You can define indexes based on your typical queries.

Then TRUNCATE and repopulate (and ANALYZE) as necessary.

On the other hand, if you can put the CTE in separate stored procedures rather than a view, you can easily put your conditions in the CTE part rather then the final SELECT (which is basically what you do querying against tree_view_1), so that much less rows will be involved in the recursion. From the query plan it looks like that PostgreSQL estimates row numbers based on some far-from-true assumptions, probably producing suboptimal plans - this effect can be reduced somewhat with the SP solution.

EDIT I may miss something, but just noticed that in the non-recursive term you don't filter the rows. Possibly you want to include only root nodes there (WHERE parent_id IS NULL) - I'd expect much less rows and recursions this way.

EDIT 2 AS it slowly became clear for me from the comments, I misthought the recursion in the original question going the other way. Here I mean starting from the root nodes and going deeper in the recursion.

Context

StackExchange Database Administrators Q#20978, answer score: 3

Revisions (0)

No revisions yet.