patternsqlModerate
Unused index in range of dates query
Viewed 0 times
rangeunusedquerydatesindex
Problem
I have a query that is not using existing indices and I do not understand why.
The table:
The query/plan:
```
mustang=# explain select max(lddate) from bss.amplifier_saturation
where start >= '1987-12-31 00:00:00'
and start Limit (cost=0.00..189.41 rows=1 width=8)
-> Index Scan Backward using amplifier_saturation_lddate on amplifier_saturation (cost=0.00..2475815.50 rows=13071 width=8)
Index Cond: (lddate IS NOT NULL)
Filter: ((start >= '1987-12-31 00:00:00-08'::
The table:
mustang=# \d+ bss.amplifier_saturation
Table "bss.amplifier_saturation"
Column | Type | Modifiers | Storage | Description
--------+--------------------------+-------------------------------------------------------------------+---------+-------------
value | integer | not null | plain |
target | integer | not null | plain |
start | timestamp with time zone | not null | plain |
end | timestamp with time zone | not null | plain |
id | integer | not null default nextval('amplifier_saturation_id_seq'::regclass) | plain |
lddate | timestamp with time zone | not null default now() | plain |
Indexes:
"amplifier_saturation_pkey" PRIMARY KEY, btree (id)
"amplifier_saturation_target_start_end_key" UNIQUE CONSTRAINT, btree (target, start, "end")
"amplifier_saturation_end" btree ("end")
"amplifier_saturation_lddate" btree (lddate)
"amplifier_saturation_start" btree (start)
"amplifier_saturation_target" btree (target)
"amplifier_saturation_value" btree (value)The query/plan:
```
mustang=# explain select max(lddate) from bss.amplifier_saturation
where start >= '1987-12-31 00:00:00'
and start Limit (cost=0.00..189.41 rows=1 width=8)
-> Index Scan Backward using amplifier_saturation_lddate on amplifier_saturation (cost=0.00..2475815.50 rows=13071 width=8)
Index Cond: (lddate IS NOT NULL)
Filter: ((start >= '1987-12-31 00:00:00-08'::
Solution
Explanation
My question is: why does this not use the index
Even with
Using
Better index
This can be substantially faster with a multicolumn index on
But there is another minor thing you can improve. Did you wonder about of this detail in your
Why does Postgres have to exclude NULL values?
Because NULL sorts after the greatest value in
Now, your table column
The other index option would be on
Two notes concerning
PST` (pacific standard time) being a random example time zone.
My question is: why does this not use the index
amplifier_saturation_start?Even with
30,000,000 rows, only 3,500 in the date range it can be faster to read tuples from the top of the index amplifier_saturation_lddate on lddate. The first row that passes the filter on start can be returned as is. No sort step needed. With a perfectly random distribution a little below 9000 index tuples would have to be checked on average.Using
amplifier_saturation_start, Postgres would still have to determinemax(lddate) after fetching all 3500 qualifying rows. Close call. The decision depends on gathered statistics and your cost settings. Depending on data distribution and other details one or the other will be faster, and one or the other will be expected to be faster.Better index
This can be substantially faster with a multicolumn index on
(lddate, start) like you found yourself already. This way Postgres can use an index-only scan and not touch the heap (the table) at all.But there is another minor thing you can improve. Did you wonder about of this detail in your
EXPLAIN output?Index Cond: ((lddate IS NOT NULL) AND ...Why does Postgres have to exclude NULL values?
Because NULL sorts after the greatest value in
ASCENDING, or before in DESCENDING order. The maximum non-null value which is returned by the aggregate function max() is not at the beginning / end of the index if there are NULL values. Adding NULLS LAST | FIRST adjusts the sort order to the characteristic of max() (and makes the opposite min() more expensive). Since we are mostly interested in the latest timestamp, DESC NULLS LAST is the better choice.CREATE INDEX tmp_as ON bss.amplifier_saturation (lddate DESC NULLS LAST, start);Now, your table column
lddate obviously doesn't have NULL values, being defined NOT NULL. The effect on performance will be negligible in this particular case. Still worth mentioning for cases that can have NULL.The other index option would be on
(start, lddate), basically a pimped amplifier_saturation_start index, which also would allow index-only scans. Depending on data distribution and actual parameter values in your query one or the other will be faster.Two notes concerning
timestamp- Your table columns are
timestamptz, but your query predicates usetimestampliterals. Postgres derives the time zone from your currenttimezonesetting and adjusts accordingly. This may or may not be as intended. It certainly makes the query volatile - depending on a setting of your session. It would be problematic for a call that could be made from different time zones (with differing sessions settings). Then you'd rather use an explicit offset or theAT TIME ZONEconstruct to make it stable. Details:
- Ignoring timezones altogether in Rails and PostgreSQL
- You typically would want to exclude the upper bound for correctness.
select max(lddate)
from bss.amplifier_saturation
where start >= '1987-12-31 00:00:00'::timestamp AT TIME ZONE 'PST'
and start '1988-04-09 00:00:00 PST'::timestamptz; -- shorterPST` (pacific standard time) being a random example time zone.
Code Snippets
Index Cond: ((lddate IS NOT NULL) AND ...CREATE INDEX tmp_as ON bss.amplifier_saturation (lddate DESC NULLS LAST, start);select max(lddate)
from bss.amplifier_saturation
where start >= '1987-12-31 00:00:00'::timestamp AT TIME ZONE 'PST'
and start < '1988-04-09 00:00:00 PST'::timestamptz; -- shorterContext
StackExchange Database Administrators Q#90128, answer score: 15
Revisions (0)
No revisions yet.