HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

Postgres choosing a filter instead of index cond when OR is involved

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
condpostgresinsteadfilterwheninvolvedindexchoosing

Problem

I have a table that daily gets about 20 million records added to it, and I'm trying to paginate through it to let people get access to all the data in it, but the query time has to be "decent" (in my case defined as less than 30s/query).

To do this I've been using keyset pagination in the past but for this particular query and table I've gotten some really slow query times, and it seems to be because the query planner decides to filter out a day's worth of data and then run a filter on it instead of an index condition scan.

The table looks like this:

create table mmsi_positions_archive
(
    id bigserial not null
        constraint mmsi_positions_archive_pkey
            primary key,
    position_id uuid,
    previous_id uuid,
    mmsi bigint not null,
    collection_type varchar not null,
    accuracy numeric,
    maneuver numeric,
    rate_of_turn numeric,
    status integer,
    speed numeric,
    course numeric,
    heading numeric,
    position geometry(Point,4326),
    timestamp timestamp with time zone not null,
    updated_at timestamp with time zone default now(),
    created_at timestamp with time zone default now()
);

create index ix_mmsi_positions_archive_mmsi
    on mmsi_positions_archive (mmsi);

create index ix_mmsi_positions_archive_position_id
    on mmsi_positions_archive (position_id);

create index ix_mmsi_positions_archive_timestamp_mmsi_id_asc
    on mmsi_positions_archive (timestamp, id);


The column I'm trying to paginate by is the timestamp and id, to help out I've also updated the table statistics target for timestamp and set it to the max value of 10 000 and analyzed the table.

The table is also partitioned by quarter, but at the moment I'm only operating on the data for a single partition.

Fast query

SELECT id
FROM mmsi_positions_archive
WHERE timestamp > '2019-03-10 00:00:00.000000+00:00'
  AND timestamp <= '2019-03-11 00:00:00+00:00'
ORDER BY timestamp, id
LIMIT 100


Which gives the following query plan (not

Solution

Fortunately this is very simple in PostgreSQL as it supports comparisons between “row values” (or compound values) that can make use of an index.

So you can write:

WHERE (timestamp, id) > ('2019-03-10 00:00:00+00:00', 1032749689)
  AND timestamp <= '2019-03-11 00:00:00+00:00'
ORDER BY timestamp, id
LIMIT 100


The comparison of such row values is lexicographical, that is exactly like you want it.

Here is a documentation link for that feature.

Code Snippets

WHERE (timestamp, id) > ('2019-03-10 00:00:00+00:00', 1032749689)
  AND timestamp <= '2019-03-11 00:00:00+00:00'
ORDER BY timestamp, id
LIMIT 100

Context

StackExchange Database Administrators Q#241591, answer score: 6

Revisions (0)

No revisions yet.