patternsqlMinor
Can Postgres scan indexes backwards?
Viewed 0 times
canscanpostgresbackwardsindexes
Problem
We use an Amazon RDS instance with
PostgreSQL 11.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
I have a simple classic top-1-per-group query. I need to get the latest item in the history for each
Here is a table and index definitions:
When a query orders by
I expected to see exactly the same plan for a query when I order by
```
EXPLAIN (ANALYZE)
SELECT history.id, history."creativeScheduleId"
FROM (
SELECT cssh.id, cssh."creativeScheduleId"
, ROW_NUMBER() OV
PostgreSQL 11.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
I have a simple classic top-1-per-group query. I need to get the latest item in the history for each
creativeScheduleId.Here is a table and index definitions:
CREATE TABLE IF NOT EXISTS public.creative_schedule_status_histories (
id serial PRIMARY KEY,
"creativeScheduleId" text NOT NULL,
-- other columns
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_creativescheduleid_id
ON public.creative_schedule_status_histories ("creativeScheduleId" ASC, id ASC);When a query orders by
id ASC the engine reads only the index and does not do any extra sorts:EXPLAIN (ANALYZE)
SELECT history.id, history."creativeScheduleId"
FROM (
SELECT cssh.id, cssh."creativeScheduleId"
, ROW_NUMBER() OVER (PARTITION BY cssh."creativeScheduleId"
ORDER BY cssh.id ASC) AS rn -- !
FROM creative_schedule_status_histories as cssh
) AS history
WHERE history.rn = 1;"Subquery Scan on history (cost=0.56..511808.63 rows=26377 width=41) (actual time=0.047..4539.058 rows=709030 loops=1)"
" Filter: (history.rn = 1)"
" Rows Removed by Filter: 4579766"
" -> WindowAgg (cost=0.56..445866.24 rows=5275391 width=49) (actual time=0.046..4165.835 rows=5288796 loops=1)"
" -> Index Only Scan using idx_creativescheduleid_id on creative_schedule_status_histories cssh (cost=0.56..353546.90 rows=5275391 width=41) (actual time=0.037..1447.490 rows=5288796 loops=1)"
" Heap Fetches: 2372"
"Planning Time: 0.072 ms"
"Execution Time: 4568.235 ms"
I expected to see exactly the same plan for a query when I order by
id DESC, but there is an explicit sort in the plan which spills to disk and obviously everything is just slower.```
EXPLAIN (ANALYZE)
SELECT history.id, history."creativeScheduleId"
FROM (
SELECT cssh.id, cssh."creativeScheduleId"
, ROW_NUMBER() OV
Solution
Because of the discussed limitation, we can't make Postgres scan the index backwards for the particular use case. However ...
Clean test case
I removed the noise from the test case:
Get rid of the expensive sort step
In Postgres 14 or later I see an
In Postgres 11 or later the additional sort goes away with this workaround:
That's based on Paul's query It's better than than my first idea to compare row number and count per partition. I adapted to get the greatest
It scans the index forwards. To see an actual
The first variant is just slightly shorter and faster.
While you are stuck with your original query, you can at least make the sort happen in RAM. Your query plan says
What you really want
Your current query never wins any competition.
To make your query fast (even without additional index):
For few rows per group (and sufficient
For many rows per group (like in this test),
For more than a few rows per group, we'd want an index skip scan. Considerable effort has been made but, unfortunately, that didn't make it into Postgres 15. We can still emulate the technique to great effect with a recursive CTE:
`'CTE Scan on cte (cost=40.74..42.76 rows=101 width=8) (actual time=0.013..0.304 rows=34 loops=1)'
' CTE cte'
' -> Recursive Union (cost=0.29..40.74 rows=101 width=8) (actual time=0.012..0.297 rows=34 loops=1)'
' -> Limit (cost=0.29..0.33 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1)'
' -> Index Only Scan Backward using tbl_part_id_idx on tbl (cost=0.29..3410.28 rows=100000 width=8) (actual time=0.010..0.011 rows=1 loops=1)'
' Heap Fetches: 1'
' -> Nested Loop (cost=0.29..3.84 rows=10 width=8) (actual time=0.008..0.008 rows=1 loops=34)'
' -> WorkTable Scan on cte c (cost=0.00..0.20 rows=10 width=4) (actual time=0.000..0.000 rows=1 loops=34)'
' -> Limit (cost=0.29..0.34 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=34)'
' -> Index Only Scan Backward using tbl_part_id_idx on tbl t (cost=0.29..1714.08 rows=33333 width=8) (actual time=0.007..0.007 rows=1 loops=34)'
' Index Cond: (part
One or the other is typically (much) faster than your original query.
fiddle for Postgres 15 with 3000 rows per group
fiddle for Postgres 11
fiddle for Postgres 11 with 8 rows per group (~ y
Clean test case
I removed the noise from the test case:
CREATE TABLE tbl (
id int PRIMARY KEY
, part int NOT NULL
, ballast text -- possibly big column(s)?
);
CREATE UNIQUE INDEX tbl_part_id_idx ON tbl (part, id);Get rid of the expensive sort step
In Postgres 14 or later I see an
Incremental Sort on top of an Index Only Scan.In Postgres 11 or later the additional sort goes away with this workaround:
SELECT id, part
FROM (
SELECT *
, CASE WHEN part = lead(part) OVER (ORDER BY part, id ROWS UNBOUNDED PRECEDING)
THEN false
ELSE true END AS qualified
FROM tbl
) sub
WHERE qualified;Subquery Scan on sub (cost=0.42..10293.58 rows=89880 width=8) (actual time=1.759..115.799 rows=67 loops=1)
Filter: sub.qualified
Rows Removed by Filter: 179692
-> WindowAgg (cost=0.42..8495.99 rows=179759 width=41) (actual time=0.022..106.609 rows=179759 loops=1)
-> Index Only Scan using tbl_part_id_idx on tbl (cost=0.42..4900.81 rows=179759 width=8) (actual time=0.017..29.208 rows=179759 loops=1)
Heap Fetches: 0
Planning Time: 0.114 ms
Execution Time: 115.850 ms
That's based on Paul's query It's better than than my first idea to compare row number and count per partition. I adapted to get the greatest
id per group, simplified, and switched to ROWS mode for better performance. See:- Find first 3 orders for each customer
It scans the index forwards. To see an actual
Index Scan backwards:...
CASE WHEN part = lag(part) OVER (ORDER BY part DESC, id DESC ROWS UNBOUNDED PRECEDING)
THEN false
ELSE true END AS qualified
...The first variant is just slightly shorter and faster.
While you are stuck with your original query, you can at least make the sort happen in RAM. Your query plan says
Disk: 263992kB. Increase work_mem by 300 MB (if you can afford that) to achieve that. Possibly just in your session for the big query. See:- Slow query performance due to temporary file?
What you really want
Your current query never wins any competition.
To make your query fast (even without additional index):
For few rows per group (and sufficient
work_mem), use DISTINCT ON. It's fastest with matching index, and maybe even without:SELECT DISTINCT ON (part) id, part
FROM tbl
ORDER BY part, id DESC;Unique (cost=33480.91..34380.55 rows=67 width=8) (actual time=96.726..131.735 rows=67 loops=1)
-> Sort (cost=33480.91..33930.73 rows=179929 width=8) (actual time=96.724..118.292 rows=179929 loops=1)
Sort Key: part, id DESC
Sort Method: external merge Disk: 3184kB
-> Index Only Scan using tbl_part_id_idx on tbl (cost=0.42..4903.35 rows=179929 width=8) (actual time=0.019..25.871 rows=179929 loops=1)
Heap Fetches: 0
Planning Time: 0.102 ms
Execution Time: 132.208 ms
For many rows per group (like in this test),
DISTINCT ON is not ideal, but typically not that bad, either.For more than a few rows per group, we'd want an index skip scan. Considerable effort has been made but, unfortunately, that didn't make it into Postgres 15. We can still emulate the technique to great effect with a recursive CTE:
EXPLAIN ANALYZE
WITH RECURSIVE cte AS (
(
SELECT part, id
FROM tbl
ORDER BY part DESC, id DESC
LIMIT 1
)
UNION ALL
SELECT l.*
FROM cte c
CROSS JOIN LATERAL (
SELECT t.part, t.id
FROM tbl t
WHERE t.part < c.part
ORDER BY t.part DESC, t.id DESC
LIMIT 1
) l
)
TABLE cte;`'CTE Scan on cte (cost=40.74..42.76 rows=101 width=8) (actual time=0.013..0.304 rows=34 loops=1)'
' CTE cte'
' -> Recursive Union (cost=0.29..40.74 rows=101 width=8) (actual time=0.012..0.297 rows=34 loops=1)'
' -> Limit (cost=0.29..0.33 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1)'
' -> Index Only Scan Backward using tbl_part_id_idx on tbl (cost=0.29..3410.28 rows=100000 width=8) (actual time=0.010..0.011 rows=1 loops=1)'
' Heap Fetches: 1'
' -> Nested Loop (cost=0.29..3.84 rows=10 width=8) (actual time=0.008..0.008 rows=1 loops=34)'
' -> WorkTable Scan on cte c (cost=0.00..0.20 rows=10 width=4) (actual time=0.000..0.000 rows=1 loops=34)'
' -> Limit (cost=0.29..0.34 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=34)'
' -> Index Only Scan Backward using tbl_part_id_idx on tbl t (cost=0.29..1714.08 rows=33333 width=8) (actual time=0.007..0.007 rows=1 loops=34)'
' Index Cond: (part
One or the other is typically (much) faster than your original query.
fiddle for Postgres 15 with 3000 rows per group
fiddle for Postgres 11
fiddle for Postgres 11 with 8 rows per group (~ y
Code Snippets
CREATE TABLE tbl (
id int PRIMARY KEY
, part int NOT NULL
, ballast text -- possibly big column(s)?
);
CREATE UNIQUE INDEX tbl_part_id_idx ON tbl (part, id);SELECT id, part
FROM (
SELECT *
, CASE WHEN part = lead(part) OVER (ORDER BY part, id ROWS UNBOUNDED PRECEDING)
THEN false
ELSE true END AS qualified
FROM tbl
) sub
WHERE qualified;...
CASE WHEN part = lag(part) OVER (ORDER BY part DESC, id DESC ROWS UNBOUNDED PRECEDING)
THEN false
ELSE true END AS qualified
...SELECT DISTINCT ON (part) id, part
FROM tbl
ORDER BY part, id DESC;EXPLAIN ANALYZE
WITH RECURSIVE cte AS (
(
SELECT part, id
FROM tbl
ORDER BY part DESC, id DESC
LIMIT 1
)
UNION ALL
SELECT l.*
FROM cte c
CROSS JOIN LATERAL (
SELECT t.part, t.id
FROM tbl t
WHERE t.part < c.part
ORDER BY t.part DESC, t.id DESC
LIMIT 1
) l
)
TABLE cte;Context
StackExchange Database Administrators Q#317886, answer score: 3
Revisions (0)
No revisions yet.