snippetsqlMinor
How to get index scan for OR'ed time-range predicates?
Viewed 0 times
predicatesscanrangetimegetforhowindex
Problem
I have table
And have b-tree index on
Postgres uses
But when I leave one condition for
events with fields:id
user_id
time_start
time_end
...And have b-tree index on
(time_start, time_end).SELECT user_id
FROM events
WHERE ((time_start = '2021-08-24T15:30:00+00:00') OR
(time_start = '2021-08-24T15:59:00+00:00'))
GROUP BY user_id);Group (cost=243735.42..243998.32 rows=1103 width=4) (actual time=186.533..188.244 rows=166 loops=1)
Group Key: user_id
Buffers: shared hit=224848
-> Gather Merge (cost=243735.42..243992.80 rows=2206 width=4) (actual time=186.532..188.199 rows=176 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=224848
-> Sort (cost=242735.39..242738.15 rows=1103 width=4) (actual time=184.121..184.126 rows=59 loops=3)
Sort Key: user_id
Sort Method: quicksort Memory: 27kB
Worker 0: Sort Method: quicksort Memory: 27kB
Worker 1: Sort Method: quicksort Memory: 28kB
Buffers: shared hit=224848
-> Partial HashAggregate (cost=242668.62..242679.65 rows=1103 width=4) (actual time=184.065..184.085 rows=59 loops=3)
Group Key: user_id
Buffers: shared hit=224834
-> Parallel Seq Scan on events (cost=0.00..242553.74 rows=45952 width=4) (actual time=104.085..183.994 rows=64 loops=3)
Filter: (((time_start = '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start = '2021-08-24 15:59:00+00'::timestamp with time zone)))
Rows Removed by Filter: 708728
Buffers: shared hit=224834
Planning Time: 0.169 ms
Execution Time: 188.294 msPostgres uses
Seq Scan with filter:Filter: (((time_start = '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start = '2021-08-24 15:59:00+00'::timestamp with time zone)))But when I leave one condition for
time_start and time_end it starts to use Index Scan.Solution
Expression index
A GiST or (even better) SP-GiST expression index on an inclusive timestamp range should work wonders.
Rewrite your query with the "range contains" operator
You'll get a query plan like this:
Should be much faster.
Range types assume inclusive lower and exclusive upper bound unless instructed otherwise.
Since you operate with
A GiST or (even better) SP-GiST expression index on an inclusive timestamp range should work wonders.
CREATE INDEX events_right_idx ON events USING spgist (tsrange(time_start, time_end, '[]'));Rewrite your query with the "range contains" operator
@> and match the indexed expression (exactly equivalent to your original):SELECT user_id
FROM events
WHERE tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:30:00'
OR tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:59:00'
GROUP BY user_id;You'll get a query plan like this:
HashAggregate (cost=9.90..10.00 rows=10 width=4)
Group Key: user_id
-> Bitmap Heap Scan on events (cost=2.57..9.88 rows=10 width=4)
Recheck Cond: ((tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:30:00''::timestamp without time zone) OR (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:59:00''::timestamp without time zone))
-> BitmapOr (cost=2.57..2.57 rows=10 width=0)
-> Bitmap Index Scan on events_right_expr_idx (cost=0.00..1.28 rows=5 width=0)
Index Cond: (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:30:00''::timestamp without time zone)
-> Bitmap Index Scan on events_right_expr_idx (cost=0.00..1.28 rows=5 width=0)
Index Cond: (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:59:00''::timestamp without time zone)
Should be much faster.
Range types assume inclusive lower and exclusive upper bound unless instructed otherwise.
tsrange(time_start, time_end) is the same as tsrange(time_start, time_end), '[)').Since you operate with
>= and
- Optimizing queries on a range of timestamps (two columns)
Alternatively, store range column in table
Should be a bit faster, yet, as plain (not expression) index.
You can add a timestamp range column to your table, like:
ALTER TABLE event ADD COLUMN ts_range tsrange GENERATED ALWAYE AS (tsrange(time_start, time_end, '[]')) STORED;
See:
- Computed / calculated / virtual / derived columns in PostgreSQL
Or, more radically, replace time_start and time_end with the range column. Then index and query are a bit simpler:
CREATE INDEX events_right_idx ON events USING spgist (ts_range);
SELECT user_id
FROM events
WHERE ts_range @> timestamp '2021-08-24T15:30:00'
OR ts_range @> timestamp '2021-08-24T15:59:00'
GROUP BY user_id;
But the tsrange column occupies more space than two timestamp columns. Weigh cost and benefits.
Asides
Postgres 14 (currently beta) even allows a covering SP-GiST index. The release notes:
Allow SP-GiST to use INCLUDE'd columns (Pavel Borisov)
But I don't think you can get index-only scans for your particular query.
If you have to make do with your B-tree index for some reason, this fixed UNION query shouldn't be too bad:
SELECT user_id
FROM events
WHERE '2021-08-24T15:30:00' BETWEEN time_start AND time_end
UNION
SELECT user_id
FROM events
WHERE '2021-08-24T15:59:00' BETWEEN time_start AND time_end
Notably, no GROUP BY. UNION already does all the work.
And simplify with BETWEEN (no effect on performance).
Also, you seem to have a wild mix of timestamp without time zone and timestamp with time zone. And name it "time" to add to the confusion. Typically timestamptz is the better choice. See:
- Ignoring time zones altogether in Rails and PostgreSQL
Last, but not least, this indicates inaccurate column statistics, leading to a sub-optimal query plan:
-> Parallel Seq Scan on events (cost=0.00..242553.74 rows=45952 width=4)
(actual time=104.085..183.994 rows=64 loops=3)
Run
ANALYZE events;
And retry. Your original query can use a plain B-tree index. It's just not as efficient as the suggested SP-GiST index.
And then maybe tune your autovacuum` and statistics settings to avoid bad statistics in the future. See:- Aurora PostgreSQL database using a slower query plan than a normal PostgreSQL for an identical query?
- Keep PostgreSQL from sometimes choosing a bad query plan
Code Snippets
CREATE INDEX events_right_idx ON events USING spgist (tsrange(time_start, time_end, '[]'));SELECT user_id
FROM events
WHERE tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:30:00'
OR tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:59:00'
GROUP BY user_id;ALTER TABLE event ADD COLUMN ts_range tsrange GENERATED ALWAYE AS (tsrange(time_start, time_end, '[]')) STORED;CREATE INDEX events_right_idx ON events USING spgist (ts_range);
SELECT user_id
FROM events
WHERE ts_range @> timestamp '2021-08-24T15:30:00'
OR ts_range @> timestamp '2021-08-24T15:59:00'
GROUP BY user_id;SELECT user_id
FROM events
WHERE '2021-08-24T15:30:00' BETWEEN time_start AND time_end
UNION
SELECT user_id
FROM events
WHERE '2021-08-24T15:59:00' BETWEEN time_start AND time_endContext
StackExchange Database Administrators Q#298625, answer score: 6
Revisions (0)
No revisions yet.