HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to optimize a recursive query in Postgresql?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlqueryrecursiveoptimizehow

Problem

I have a recursive query that takes too long - 30+ ms where doing the individual queries to extract the same data manually takes create table subjects
(
subject_id bigint not null
constraint pk_subjects
primary key
);

create table subject_group_members
(
subject_group_id bigint not null
constraint fk_subject_group_members_subject_group_id_subjects_subject_id
references subjects(subject_id)
on delete cascade,
subject_id bigint not null
constraint fk_subject_group_members_subject_id_subjects_subject_id
references subjects(subject_id)
on delete cascade,
constraint pk_subject_group_members
primary key (subject_group_id, subject_id)
);

create index idx_subject_group_members_subject_id
on subject_group_members (subject_id);

create index idx_subject_group_members_subject_group_id
on subject_group_members (subject_group_id);


Data might look like this:

subject_group_id
subject_id

1
2

1
3

1
4

2
5

3
5

I want to know all the groups that 5 is a member of (1 by inheritance, 2 & 3 directly, not 4 or any other subject ids).

This query works as expected:
with recursive flat_members(subject_group_id, subject_id) as (
select subject_group_id, subject_id
from subject_group_members gm
union
select
flat_members.subject_group_id as subject_group_id,
subject_group_members.subject_id as subject_id
from subject_group_members
join flat_members on flat_members.subject_id = subject_group_members.subject_group_id
)
select * from flat_members where subject_id = 5


But run with real data I get this query plan:

```
CTE Scan on flat_members (cost=36759729.47..59962757.76 rows=5156229 width=16) (actual time=26.526..55.166 rows=3 loops=1)
Filter: (subject_id = 30459)
Rows Removed by Filter: 48984
CTE flat_members
-> Recursive Union (cost=0.00..36759729.47 rows=1031245702 width=16) (actual ti

Solution

Looks like you inverted the join condition by accident.

WITH RECURSIVE flat_members AS (
   SELECT subject_group_id
   FROM   subject_group_members gm
   WHERE  subject_id = 5

   UNION
   SELECT gm.subject_group_id
   FROM   flat_members          fm
   JOIN   subject_group_members gm ON gm.subject_id = fm.subject_group_id
   )
TABLE flat_members;


fiddle

Plus, move the filter WHERE subject_id = 5 up to the initial SELECT to filter irrelevant rows early - and allow for an optimized query plan, typically using an index. Speaking of which, this multicolumn index would serve much better, allowing index-only scans:

CREATE INDEX subject_group_members_subject_id_subject_group_id
    ON subject_group_members (subject_id, subject_group_id);


(Might as well be UNIQUE.) In addition to your PK on (subject_group_id, subject_id). Or invert the columns in the PK definition, either might be useful.

About index-only scans:

  • Can Postgres use an index-only scan for this query with joined tables?



It's typically best to just have a PK on (subject_id, subject_group_id), another multicolumn index on (subject_group_id, subject_id), and drop the two indexes on only (subject_id) and (subject_group_id). See:

  • Is a composite index also good for queries on the first field?

Code Snippets

WITH RECURSIVE flat_members AS (
   SELECT subject_group_id
   FROM   subject_group_members gm
   WHERE  subject_id = 5

   UNION
   SELECT gm.subject_group_id
   FROM   flat_members          fm
   JOIN   subject_group_members gm ON gm.subject_id = fm.subject_group_id
   )
TABLE flat_members;
CREATE INDEX subject_group_members_subject_id_subject_group_id
    ON subject_group_members (subject_id, subject_group_id);

Context

StackExchange Database Administrators Q#317027, answer score: 4

Revisions (0)

No revisions yet.