snippetsqlMinor
PostgreSQL filter/aggregate performance fluctuates depending on condition value
Viewed 0 times
postgresqlconditionvaluefilterperformancedependingaggregatefluctuates
Problem
I'm experiencing a strange variation of filter performance depending on the value in
Query with
I have noticed that the drastic slowdown happens when aggregated column (
I would understand if this was happening when using index lookup, but it happens when using filter - which to my knowledge should scan all rows.
With
However, with
`Result (cost=129.15..129.16 rows=1 width=0) (actual time=8138.644..8138.669 rows=1 loops=1)
InitPlan 1
WHERE condition with this query:SELECT MIN(revenue) AS min_revenue,
MAX(revenue) AS max_revenue
FROM v_premises_filter
WHERE area IS NOT NULL
AND v_premises_filter.scat_group_code =1Query with
=1 condition executes much faster than =23.I have noticed that the drastic slowdown happens when aggregated column (
revenue in the example below) has only NULL values for rows matching the WHERE condition.I would understand if this was happening when using index lookup, but it happens when using filter - which to my knowledge should scan all rows.
With
scat_group_code =1, the query executes quickly:Result (cost=2.92..2.93 rows=1 width=0) (actual time=0.239..0.241 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.43..1.46 rows=1 width=8) (actual time=0.134..0.135 rows=1 loops=1)
-> Index Scan using v_premises_filter_revenue_idx on v_premises_filter (cost=0.43..310356.33 rows=301212 width=8) (actual time=0.130..0.130 rows=1 loops=1)
Index Cond: (revenue IS NOT NULL)
Filter: ((area IS NOT NULL) AND (scat_group_code = 1))
Rows Removed by Filter: 20
InitPlan 2 (returns $1)
-> Limit (cost=0.43..1.46 rows=1 width=8) (actual time=0.094..0.096 rows=1 loops=1)
-> Index Scan Backward using v_premises_filter_revenue_idx on v_premises_filter v_premises_filter_1 (cost=0.43..310356.33 rows=301212 width=8) (actual time=0.091..0.091 rows=1 loops=1)
Index Cond: (revenue IS NOT NULL)
Filter: ((area IS NOT NULL) AND (scat_group_code = 1))
Rows Removed by Filter: 17
Planning time: 0.359 ms
Execution time: 0.276 ms
However, with
scat_group_code =23, the same query with the same execution plan takes much more time:`Result (cost=129.15..129.16 rows=1 width=0) (actual time=8138.644..8138.669 rows=1 loops=1)
InitPlan 1
Solution
You have a lot of indexes and my first impuls would be to check whether all of them are actually used and remove the unused ones.
The optimal index for your query is not there, yet, though: a multicolumn index on
Also, since the table (many columns, some potentially big text columns) is much wider than the index (two columns, fewer rows for the partial variant), an index-only scan is particularly efficient here. Size matters.
If this particular query is very common / important, I would go one step further and make it a partial, multicolumn index:
I added the 2nd condition
The index your query used in your tests (
Also, since irrelevant rows are excluded from the index, it is substantially smaller, which contribute3s to performance as well.
You should see two index-only scans and a much faster query in either case now.
Related:
Why the fluctuation?
Your table is big (1909175 rows) and data distribution is obviously irregular. The default
But you won't need this with the suggested new index any more, because an index-only scan on the new index is always the best plan.
Consider this related answer from just yesterday:
The optimal index for your query is not there, yet, though: a multicolumn index on
(scat_group_code, revenue). Since this is a materialized view, it's reasonable to assume a read-only situation, which is perfect for index-only scans.Also, since the table (many columns, some potentially big text columns) is much wider than the index (two columns, fewer rows for the partial variant), an index-only scan is particularly efficient here. Size matters.
If this particular query is very common / important, I would go one step further and make it a partial, multicolumn index:
CREATE INDEX ON v_premises_filter (scat_group_code, revenue)
WHERE area IS NOT NULL
AND revenue IS NOT NULL;I added the 2nd condition
revenue IS NOT NULL because min() and max() ignore NULL values. I tested with Postgres 9.6. You may have to add the logically redundant predicate AND revenue IS NOT NULL to your query in older versions to make the query planner understand the index is applicable.The index your query used in your tests (
v_premises_filter_revenue_idx) is on (revenue), which is pretty inefficient for a predicate on scat_group_code. Adding that column as first index item makes a big difference, because Postgres can cheaply grab only the share with matching scat_group_code now. Where your 2nd query had to filter 1631527 rows, (Rows Removed by Filter: 1631527") it only has to filter much fewer rows with area is null OR revenue is null for the simple index and none at all for the partial index. Also, since irrelevant rows are excluded from the index, it is substantially smaller, which contribute3s to performance as well.
You should see two index-only scans and a much faster query in either case now.
Related:
- Is a composite index also good for queries on the first field?
- How to properly implement compound greatest-n filtering
Why the fluctuation?
Your table is big (1909175 rows) and data distribution is obviously irregular. The default
statistics_target setting of 100 is probably too small to gather enough details. Postgres decides to use a bad query plan for your 2nd query, based on insufficient information. It might help to increase the STATISTICS setting for scat_group_code, revenue and area.But you won't need this with the suggested new index any more, because an index-only scan on the new index is always the best plan.
Consider this related answer from just yesterday:
- Inconsistent statistics on jsonb column with btree index
Code Snippets
CREATE INDEX ON v_premises_filter (scat_group_code, revenue)
WHERE area IS NOT NULL
AND revenue IS NOT NULL;Context
StackExchange Database Administrators Q#167625, answer score: 4
Revisions (0)
No revisions yet.