patternsqlMinor
Aurora PostgreSQL database using a slower query plan than a normal PostgreSQL for an identical query?
Viewed 0 times
postgresqlqueryslowerthanidenticalnormalauroradatabaseplanusing
Problem
Following the migration of an application and its database from a classical PostgreSQL database to an Amazon Aurora RDS PostgreSQL database (both using 9.6 version), we have found that a specific query is running much slower -- around 10 times slower -- on Aurora than on PostgreSQL.
Both databases have the same configuration, be it for the hardware or the pg_conf.
The query itself is fairly simple. It is generated from our backend written in Java and using jOOQ for writing the queries:
With
When we run it on the normal PostgreSQL database, it generates the following query plan:
```
Unique (cost=4802622.20..4868822.51 rows=8826708 width=43) (actual time=483.928..483.930 rows=1 loops=1)
CTE all_acp_ids
-> Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089 (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..
Both databases have the same configuration, be it for the hardware or the pg_conf.
The query itself is fairly simple. It is generated from our backend written in Java and using jOOQ for writing the queries:
with "all_acp_ids"("acp_id") as (
select acp_id from temp_table_de3398bacb6c4e8ca8b37be227eac089
)
select distinct "public"."f1_folio_milestones"."acp_id",
coalesce("public"."sa_milestone_overrides"."team",
"public"."f1_folio_milestones"."team_responsible")
from "public"."f1_folio_milestones"
left outer join
"public"."sa_milestone_overrides" on (
"public"."f1_folio_milestones"."milestone" = "public"."sa_milestone_overrides"."milestone"
and "public"."f1_folio_milestones"."view" = "public"."sa_milestone_overrides"."view"
and "public"."f1_folio_milestones"."acp_id" = "public"."sa_milestone_overrides"."acp_id"
)
where "public"."f1_folio_milestones"."acp_id" in (
select "all_acp_ids"."acp_id" from "all_acp_ids"
)With
temp_table_de3398bacb6c4e8ca8b37be227eac089 being a single-column table, f1_folio_milestones (17 million entries) and sa_milestone_overrides (Around 1 million entries) being similarly designed tables having indexes on all the columns used for the LEFT OUTER JOIN.temp_table_de3398bacb6c4e8ca8b37be227eac089 can contain up to 5000 entries, all of them being distinct.When we run it on the normal PostgreSQL database, it generates the following query plan:
```
Unique (cost=4802622.20..4868822.51 rows=8826708 width=43) (actual time=483.928..483.930 rows=1 loops=1)
CTE all_acp_ids
-> Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089 (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..
Solution
One problem stands out (in all query plans) and is easy to fix:
Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089 (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..0.005 rows=1 loops=1)
Bold emphasis mine. Says, Postgres expects 1360 rows in that table, but finds only 1.
You commented:
it's a normal table which gets dropped after everything is done. [...] the query plan was done with a single value in the table, but it can have up to 5000 entries in total, all distinct.
.. which can explain the the totally misleading expected row count. Statistics are kept up to date by autovacuum. But it needs some time to kick in. If you run a complex query right after populating (or largely modifying) such a table, it would be wise to run at least
Maybe even
An actual temporary table (
Failing that, Postgres tries all sorts of inapt query plans, based on a completely misleading row estimation in the core table. There are many possible reasons why Postgres picks a different query plan - anything that changes cost estimations. But fix that and the problem at hand may very well go away.
"cost" in Postgres query plans is the estimated time (in an arbitrary unit).
"actual time" is time measured after the fact.
Postgres always favors lower cost. That's how it decides which plan to pick. Not related to Aurora in any way. Your main problem are misleading statistics --> misleading row estimates --> misleading cost estimates. Details in the manual. Start here and here
Alternative query
That said, I would simplify the query. The CTE adds nothing useful to the query - materialized or not. Remove it.
Given your cardinalities (max 5000 rows in
With favorable data distribution and fitting indexes, it might be much faster. Further reading (recommended if you want to optimize!):
Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089 (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..0.005 rows=1 loops=1)
Bold emphasis mine. Says, Postgres expects 1360 rows in that table, but finds only 1.
You commented:
it's a normal table which gets dropped after everything is done. [...] the query plan was done with a single value in the table, but it can have up to 5000 entries in total, all distinct.
.. which can explain the the totally misleading expected row count. Statistics are kept up to date by autovacuum. But it needs some time to kick in. If you run a complex query right after populating (or largely modifying) such a table, it would be wise to run at least
ANALYZE manually in between:ANALYZE temp_table_de3398bacb6c4e8ca8b37be227eac089;Maybe even
VACUUM ANALYZE, but that cannot be run inside a transaction.An actual temporary table (
CREATE TEMP TABLE ...) would seem like the better choice for your use case. (While other sessions don't need to see the same state of the table.) Better performance overall. But worth mentioning that those are exempt from being analyzed by autovacuum at all. See:- Are regular VACUUM ANALYZE still recommended under 9.1?
Failing that, Postgres tries all sorts of inapt query plans, based on a completely misleading row estimation in the core table. There are many possible reasons why Postgres picks a different query plan - anything that changes cost estimations. But fix that and the problem at hand may very well go away.
"cost" in Postgres query plans is the estimated time (in an arbitrary unit).
"actual time" is time measured after the fact.
Postgres always favors lower cost. That's how it decides which plan to pick. Not related to Aurora in any way. Your main problem are misleading statistics --> misleading row estimates --> misleading cost estimates. Details in the manual. Start here and here
Alternative query
That said, I would simplify the query. The CTE adds nothing useful to the query - materialized or not. Remove it.
SELECT DISTINCT m.acp_id,
COALESCE(o.team, m.team_responsible)
FROM temp_table_de3398bacb6c4e8ca8b37be227eac089 t
JOIN public.f1_folio_milestones m USING (acp_id)
LEFT JOIN public.sa_milestone_overrides o USING (milestone, view, acp_id);USING is just convenience to save typing. No performance impact. Only use where applicable.Given your cardinalities (max 5000 rows in
t, but 17 million in m) I would try this alternative query with a lateral subquery:SELECT t.acp_id, om.team
FROM temp_table_de3398bacb6c4e8ca8b37be227eac089 t
CROSS JOIN LATERAL (
SELECT COALESCE(o.team, m.team_responsible) AS team
FROM public.f1_folio_milestones m
LEFT JOIN public.sa_milestone_overrides o USING (milestone, view, acp_id)
WHERE m.acp_id = t.acp_id
) om;With favorable data distribution and fitting indexes, it might be much faster. Further reading (recommended if you want to optimize!):
- Optimize GROUP BY query to retrieve latest row per user
Code Snippets
ANALYZE temp_table_de3398bacb6c4e8ca8b37be227eac089;SELECT DISTINCT m.acp_id,
COALESCE(o.team, m.team_responsible)
FROM temp_table_de3398bacb6c4e8ca8b37be227eac089 t
JOIN public.f1_folio_milestones m USING (acp_id)
LEFT JOIN public.sa_milestone_overrides o USING (milestone, view, acp_id);SELECT t.acp_id, om.team
FROM temp_table_de3398bacb6c4e8ca8b37be227eac089 t
CROSS JOIN LATERAL (
SELECT COALESCE(o.team, m.team_responsible) AS team
FROM public.f1_folio_milestones m
LEFT JOIN public.sa_milestone_overrides o USING (milestone, view, acp_id)
WHERE m.acp_id = t.acp_id
) om;Context
StackExchange Database Administrators Q#260429, answer score: 6
Revisions (0)
No revisions yet.