HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Can Postgres scan indexes backwards?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
canscanpostgresbackwardsindexes

Problem

We use an Amazon RDS instance with

PostgreSQL 11.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit

I have a simple classic top-1-per-group query. I need to get the latest item in the history for each creativeScheduleId.

Here is a table and index definitions:

CREATE TABLE IF NOT EXISTS public.creative_schedule_status_histories (
  id serial PRIMARY KEY,
  "creativeScheduleId" text NOT NULL,
  -- other columns
);

CREATE UNIQUE INDEX IF NOT EXISTS idx_creativescheduleid_id
  ON public.creative_schedule_status_histories ("creativeScheduleId" ASC, id ASC);


When a query orders by id ASC the engine reads only the index and does not do any extra sorts:

EXPLAIN (ANALYZE) 
SELECT history.id, history."creativeScheduleId"
FROM  (
    SELECT cssh.id, cssh."creativeScheduleId"
         , ROW_NUMBER() OVER (PARTITION BY cssh."creativeScheduleId"
                              ORDER BY cssh.id ASC) AS rn  -- !
    FROM creative_schedule_status_histories as cssh
    ) AS history
WHERE history.rn = 1;


"Subquery Scan on history (cost=0.56..511808.63 rows=26377 width=41) (actual time=0.047..4539.058 rows=709030 loops=1)"
" Filter: (history.rn = 1)"
" Rows Removed by Filter: 4579766"
" -> WindowAgg (cost=0.56..445866.24 rows=5275391 width=49) (actual time=0.046..4165.835 rows=5288796 loops=1)"
" -> Index Only Scan using idx_creativescheduleid_id on creative_schedule_status_histories cssh (cost=0.56..353546.90 rows=5275391 width=41) (actual time=0.037..1447.490 rows=5288796 loops=1)"
" Heap Fetches: 2372"
"Planning Time: 0.072 ms"
"Execution Time: 4568.235 ms"


I expected to see exactly the same plan for a query when I order by id DESC, but there is an explicit sort in the plan which spills to disk and obviously everything is just slower.

```
EXPLAIN (ANALYZE)
SELECT history.id, history."creativeScheduleId"
FROM (
SELECT cssh.id, cssh."creativeScheduleId"
, ROW_NUMBER() OV

Solution

Because of the discussed limitation, we can't make Postgres scan the index backwards for the particular use case. However ...
Clean test case

I removed the noise from the test case:

CREATE TABLE tbl (
  id   int PRIMARY KEY
, part int NOT NULL
, ballast text  -- possibly big column(s)?
);

CREATE UNIQUE INDEX tbl_part_id_idx ON tbl (part, id);


Get rid of the expensive sort step

In Postgres 14 or later I see an Incremental Sort on top of an Index Only Scan.

In Postgres 11 or later the additional sort goes away with this workaround:

SELECT id, part
FROM  (
   SELECT *
        , CASE WHEN part = lead(part) OVER (ORDER BY part, id ROWS UNBOUNDED PRECEDING)
               THEN false 
               ELSE true END AS qualified
   FROM   tbl
   ) sub
WHERE  qualified;


Subquery Scan on sub (cost=0.42..10293.58 rows=89880 width=8) (actual time=1.759..115.799 rows=67 loops=1)
  Filter: sub.qualified
  Rows Removed by Filter: 179692
  -> WindowAgg (cost=0.42..8495.99 rows=179759 width=41) (actual time=0.022..106.609 rows=179759 loops=1)
        -> Index Only Scan using tbl_part_id_idx on tbl (cost=0.42..4900.81 rows=179759 width=8) (actual time=0.017..29.208 rows=179759 loops=1)
              Heap Fetches: 0
Planning Time: 0.114 ms
Execution Time: 115.850 ms


That's based on Paul's query It's better than than my first idea to compare row number and count per partition. I adapted to get the greatest id per group, simplified, and switched to ROWS mode for better performance. See:

  • Find first 3 orders for each customer



It scans the index forwards. To see an actual Index Scan backwards:

...
          CASE WHEN part = lag(part) OVER (ORDER BY part DESC, id DESC ROWS UNBOUNDED PRECEDING)
               THEN false 
               ELSE true END AS qualified
...


The first variant is just slightly shorter and faster.

While you are stuck with your original query, you can at least make the sort happen in RAM. Your query plan says Disk: 263992kB. Increase work_mem by 300 MB (if you can afford that) to achieve that. Possibly just in your session for the big query. See:

  • Slow query performance due to temporary file?



What you really want

Your current query never wins any competition.

To make your query fast (even without additional index):

For few rows per group (and sufficient work_mem), use DISTINCT ON. It's fastest with matching index, and maybe even without:

SELECT DISTINCT ON (part) id, part
FROM   tbl
ORDER  BY part, id DESC;


Unique (cost=33480.91..34380.55 rows=67 width=8) (actual time=96.726..131.735 rows=67 loops=1)
  -> Sort (cost=33480.91..33930.73 rows=179929 width=8) (actual time=96.724..118.292 rows=179929 loops=1)
        Sort Key: part, id DESC
        Sort Method: external merge Disk: 3184kB
        -> Index Only Scan using tbl_part_id_idx on tbl (cost=0.42..4903.35 rows=179929 width=8) (actual time=0.019..25.871 rows=179929 loops=1)
              Heap Fetches: 0
Planning Time: 0.102 ms
Execution Time: 132.208 ms


For many rows per group (like in this test), DISTINCT ON is not ideal, but typically not that bad, either.

For more than a few rows per group, we'd want an index skip scan. Considerable effort has been made but, unfortunately, that didn't make it into Postgres 15. We can still emulate the technique to great effect with a recursive CTE:

EXPLAIN ANALYZE
WITH RECURSIVE cte AS (
   (
   SELECT part, id
   FROM   tbl
   ORDER  BY part DESC, id DESC
   LIMIT  1
   )
   
   UNION ALL
   SELECT l.*
   FROM   cte c
   CROSS  JOIN LATERAL (
      SELECT t.part, t.id
      FROM   tbl t
      WHERE  t.part < c.part
      ORDER  BY t.part DESC, t.id DESC
      LIMIT  1
      ) l
   )
TABLE  cte;


`'CTE Scan on cte (cost=40.74..42.76 rows=101 width=8) (actual time=0.013..0.304 rows=34 loops=1)'
' CTE cte'
' -> Recursive Union (cost=0.29..40.74 rows=101 width=8) (actual time=0.012..0.297 rows=34 loops=1)'
' -> Limit (cost=0.29..0.33 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1)'
' -> Index Only Scan Backward using tbl_part_id_idx on tbl (cost=0.29..3410.28 rows=100000 width=8) (actual time=0.010..0.011 rows=1 loops=1)'
' Heap Fetches: 1'
' -> Nested Loop (cost=0.29..3.84 rows=10 width=8) (actual time=0.008..0.008 rows=1 loops=34)'
' -> WorkTable Scan on cte c (cost=0.00..0.20 rows=10 width=4) (actual time=0.000..0.000 rows=1 loops=34)'
' -> Limit (cost=0.29..0.34 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=34)'
' -> Index Only Scan Backward using tbl_part_id_idx on tbl t (cost=0.29..1714.08 rows=33333 width=8) (actual time=0.007..0.007 rows=1 loops=34)'
' Index Cond: (part

One or the other is typically (much) faster than your original query.

fiddle for Postgres 15 with 3000 rows per group

fiddle for Postgres 11

fiddle for Postgres 11 with 8 rows per group (~ y

Code Snippets

CREATE TABLE tbl (
  id   int PRIMARY KEY
, part int NOT NULL
, ballast text  -- possibly big column(s)?
);

CREATE UNIQUE INDEX tbl_part_id_idx ON tbl (part, id);
SELECT id, part
FROM  (
   SELECT *
        , CASE WHEN part = lead(part) OVER (ORDER BY part, id ROWS UNBOUNDED PRECEDING)
               THEN false 
               ELSE true END AS qualified
   FROM   tbl
   ) sub
WHERE  qualified;
...
          CASE WHEN part = lag(part) OVER (ORDER BY part DESC, id DESC ROWS UNBOUNDED PRECEDING)
               THEN false 
               ELSE true END AS qualified
...
SELECT DISTINCT ON (part) id, part
FROM   tbl
ORDER  BY part, id DESC;
EXPLAIN ANALYZE
WITH RECURSIVE cte AS (
   (
   SELECT part, id
   FROM   tbl
   ORDER  BY part DESC, id DESC
   LIMIT  1
   )
   
   UNION ALL
   SELECT l.*
   FROM   cte c
   CROSS  JOIN LATERAL (
      SELECT t.part, t.id
      FROM   tbl t
      WHERE  t.part < c.part
      ORDER  BY t.part DESC, t.id DESC
      LIMIT  1
      ) l
   )
TABLE  cte;

Context

StackExchange Database Administrators Q#317886, answer score: 3

Revisions (0)

No revisions yet.