patternsqlMinor

PostgreSQL window functions very slow on big table

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

postgresqlfunctionsslowbigverywindowtable

Problem

I have a PostgreSQL table constructed as

device_id | point     | dt_edit
----------+-----------+----------
100       | geometry  | timestamp
101       | geometry  | timestamp
100       | geometry  | timestamp
102       | geometry  | timestamp
102       | geometry  | timestamp
101       | geometry  | timestamp

I need to select the last 2 records ordered by dt_edit from unique device_id. This query works very slow, on 1 billion records - 500 seconds:

SELECT dt.device_id,
    dt.point,
    dt.dt_edit
   FROM ( SELECT gps_data.device_id,
            gps_data.point,
            gps_data.dt_edit,
            rank() OVER (PARTITION BY gps_data.device_id
                         ORDER BY gps_data.dt_edit DESC) AS rank
           FROM gps_data) dt
  WHERE dt.rank <= 2

Any ideas?

Solution

If we can assume a device table holding all devices of interest. Example:

CREATE TABLE device (device_id int, device text);
INSERT INTO device (device_id, device) VALUES
  (100, 'a')
, (101, 'b')
, (102, 'c');

The query can be very simple:

SELECT d.device_id, g.point, g.dt_edit
FROM   device d
,      LATERAL (
   SELECT point, dt_edit
   FROM   gps_data
   WHERE  device_id = d.device_id
   ORDER  BY dt_edit DESC NULLS LAST -- more items?
   LIMIT  2
   ) g;

Assuming a much smaller cardinality in table device than in table gps_data, i.e. many rows per device in the latter.

If dt_edit is not UNIQUE, add more items to ORDER BY to disambiguate the sort order.

All you need for this to be fast is a multicolumn index;

CREATE INDEX gps_data_foo_idx
ON gps_data (device_id, dt_edit DESC NULLS LAST); -- more items?

The NULLS LAST modifier is useful if the column can be NULL, but does not hurt in any case. Either way, the query must match the index.

PostgreSQL sort by datetime asc, null first?

If you don't have a device table, I would suggest to create one (and keep it current).

But there are ways to make this fast even without device table ...

Details:

Optimize GROUP BY query to retrieve latest record per user

How do I efficiently get "the most recent corresponding row"?

SELECT DISTINCT on multiple columns

Code Snippets

CREATE TABLE device (device_id int, device text);
INSERT INTO device (device_id, device) VALUES
  (100, 'a')
, (101, 'b')
, (102, 'c');

SELECT d.device_id, g.point, g.dt_edit
FROM   device d
,      LATERAL (
   SELECT point, dt_edit
   FROM   gps_data
   WHERE  device_id = d.device_id
   ORDER  BY dt_edit DESC NULLS LAST -- more items?
   LIMIT  2
   ) g;

CREATE INDEX gps_data_foo_idx
ON gps_data (device_id, dt_edit DESC NULLS LAST); -- more items?

Context

StackExchange Database Administrators Q#106987, answer score: 4

Revisions (0)

No revisions yet.