debugsqlModerate
Postgres partial index on IS NULL not working
Viewed 0 times
postgresnullworkingpartialindexnot
Problem
Postgres version
Using PostgreSQL 10.3.
Table definition
Cardinality
The table contains 1027616 rows, with 51533 of the rows having
A query with a condition on
I am trying to achieve the same or better performance for the query with the condition
However, none of the indexes I've tried result in a single index scan like the first query. Below are the indexes I've tried along with the
A partial index:
An expression index:
```
CREATE INDEX "closed_index" ON "tickets" ((closed IS NULL))
explain analyze select * from tickets where closed IS NULL;
Seq Scan on tickets (cost=0.00..45228.26 rows=36559 width=212) (actual time=0.025.
Using PostgreSQL 10.3.
Table definition
CREATE TABLE tickets (
id bigserial primary key,
state character varying,
closed timestamp
);
CREATE INDEX "state_index" ON "tickets" ("state")
WHERE ((state)::text = 'open'::text));Cardinality
The table contains 1027616 rows, with 51533 of the rows having
state = 'open' and closed IS NULL, or 5%.A query with a condition on
state performs well using an index scan as expected:explain analyze select * from tickets where state = 'open';
Index Scan using state_index on tickets (cost=0.29..16093.57 rows=36599 width=212) (actual time=0.025..22.875 rows=37346 loops=1)
Planning time: 0.212 ms
Execution time: 25.697 msI am trying to achieve the same or better performance for the query with the condition
closed IS NULL so that I can drop the state column and rely on the closed column for fetching the same rows. closed is null for the same rows where state = 'open', hence the state column is redundant.select * from tickets where closed IS NULL;However, none of the indexes I've tried result in a single index scan like the first query. Below are the indexes I've tried along with the
EXPLAIN ANALYZE results.A partial index:
CREATE INDEX "closed_index" ON "tickets" ("closed") WHERE (closed IS NULL)
explain analyze select * from tickets where closed IS NULL;
Bitmap Heap Scan on tickets (cost=604.22..38697.91 rows=36559 width=212) (actual time=12.879..48.780 rows=37348 loops=1)
Recheck Cond: (closed IS NULL)
Heap Blocks: exact=14757
-> Bitmap Index Scan on closed_index (cost=0.00..595.09 rows=36559 width=0) (actual time=7.585..7.585 rows=37348 loops=1)
Planning time: 4.831 ms
Execution time: 52.068 msAn expression index:
```
CREATE INDEX "closed_index" ON "tickets" ((closed IS NULL))
explain analyze select * from tickets where closed IS NULL;
Seq Scan on tickets (cost=0.00..45228.26 rows=36559 width=212) (actual time=0.025.
Solution
Assuming the central piece of information:
with ~15% of the rows having
is supposed to mean the same 15 % of all 1031584 rows meet both these conditions (all details matter!). Both conditions should perform equally - returning around 155k rows (!)
Your query plans show 37346 qualifying rows, ~ 3.6 % not 15 %. Something's still not right in your question.
With 3.6 %, indexes only start to make sense. Your tiny row size effectively occupies ~ 52 bytes per row, around 155 rows per page. That would be 5-6 hits per page for a completely random distribution. Postgres has to read all pages anyway and a sequential scan should be the fastest plan. Filtering the misses should be faster than involving indexes in any way.
Typically, qualifying rows are more or less clustered and the fewer data pages are involved the more sense it would make to involve an index. All bitmap index scans, though, I see hardly any case for an index scan. Even far less (as far as there can be much "less" than "hardly any") for the 15 % you claim.
For your updated numbers (~ 5% of all rows match) I would still rather expect bitmap index scans than index scans. A possible explanation: table bloat with lots of dead tuples. You mentioned a high write load. That would result in fewer and fewer live tuples per data page and favor index scans (as compared to bitmap index scans). You might re-test your initial query after a
You may need more aggressive
Partial index
Your "expression index" and "partial expression index" are not useful. We don't need
The first, plain partial index is the more useful variant. But don't use
Or, if
This makes the actual index column useless like in other dismissed variants - but it avoids all additional costs and dependencies. Related:
What I might try:
Only makes sense if you don't have many other writes on the the rows in question (The added columns
Create a partial index with
We don't need
There seems to be redundancy in your design, simplifications should be possible.
This works since Postgres 9.6, quoting the release notes:
Vondra, Kyotaro Horiguchi)
For example, an index defined by
specifies
this was disallowed because a is not listed as an index column.
Or the information in your question is misleading.
Related:
If you don't see index-only scans with this, even immediately after running
with ~15% of the rows having
state = 'open' and closed IS NULLis supposed to mean the same 15 % of all 1031584 rows meet both these conditions (all details matter!). Both conditions should perform equally - returning around 155k rows (!)
Your query plans show 37346 qualifying rows, ~ 3.6 % not 15 %. Something's still not right in your question.
With 3.6 %, indexes only start to make sense. Your tiny row size effectively occupies ~ 52 bytes per row, around 155 rows per page. That would be 5-6 hits per page for a completely random distribution. Postgres has to read all pages anyway and a sequential scan should be the fastest plan. Filtering the misses should be faster than involving indexes in any way.
Typically, qualifying rows are more or less clustered and the fewer data pages are involved the more sense it would make to involve an index. All bitmap index scans, though, I see hardly any case for an index scan. Even far less (as far as there can be much "less" than "hardly any") for the 15 % you claim.
For your updated numbers (~ 5% of all rows match) I would still rather expect bitmap index scans than index scans. A possible explanation: table bloat with lots of dead tuples. You mentioned a high write load. That would result in fewer and fewer live tuples per data page and favor index scans (as compared to bitmap index scans). You might re-test your initial query after a
VACUUM FULL ANALYZE (if you can afford an exclusive lock on the table!). If my hypothesis holds, the physical table size would shrink substantially and you would then see a bitmap index scan instead of the index scan (and faster, too).- For alignment optimized table is bigger than original table - why?
- Measure the size of a PostgreSQL table row
You may need more aggressive
autovacuum settings. See:- Aggressive Autovacuum on PostgreSQL
Partial index
Your "expression index" and "partial expression index" are not useful. We don't need
closed IS NULL as actual index expression (which is always true here). The expression only adds cost and no gain.The first, plain partial index is the more useful variant. But don't use
closed as index expression (again, always NULL here). Instead, use any column possibly useful to other queries, and ideally never updated to avoid additional cost and index bloat. The primary key column id is the natural candidate in absence of other useful applications:CREATE INDEX closed_index ON tickets (id) WHERE closed IS NULL;Or, if
id is not useful, consider a constant instead:CREATE INDEX closed_index ON tickets ((1)) WHERE closed IS NULL;This makes the actual index column useless like in other dismissed variants - but it avoids all additional costs and dependencies. Related:
- Get count estimates from pg_class.reltuples for given conditions
What I might try:
Only makes sense if you don't have many other writes on the the rows in question (The added columns
updated and comments_count make me doubt it.)Create a partial index with
id and the other relevant columns (few & small) as index expressions, and capitalize on it with a suitable query to get index-only scans:CREATE INDEX closed_index ON tickets (id, title, created, updated)
WHERE closed IS NULL;
VACUUM ANALYZE tickets; -- just to prove idx-only is possible
SELECT id, title, created, updated
, NULL::timestamp AS closed -- redundant, rather drop it
FROM tickets
WHERE closed IS NULL;We don't need
SELECT *, closed IS NULL is given by the WHERE clause. So we can use the tiny partial index in a fast index-only scan - assuming you meet the preconditions (that's why I threw in VACUUM to update the visibility map up there). This is the rare case where queries reading more than ~ 5 % of all rows still happily use the index (even up to including the whole table).There seems to be redundancy in your design, simplifications should be possible.
This works since Postgres 9.6, quoting the release notes:
- Allow use of an index-only scan on a partial index when the index's
WHEREclause references columns that are not indexed (Tomas
Vondra, Kyotaro Horiguchi)
For example, an index defined by
CREATE INDEX tidx_partial ON t(b) WHERE a > 0 can now be used for an index-only scan by a query thatspecifies
WHERE a > 0 and does not otherwise use a. Previouslythis was disallowed because a is not listed as an index column.
Or the information in your question is misleading.
Related:
- PostgreSQL partial index unused when created on a table with existing data
- Unexpected Seq Scan when doing query against boolean with value NULL
- Postgres not using index when index scan is much better option
If you don't see index-only scans with this, even immediately after running
VACUUM, then high write load may be in the way and the visibility map just never reaches a state that would allow index-only scans. The manual. Or you have anothCode Snippets
CREATE INDEX closed_index ON tickets (id) WHERE closed IS NULL;CREATE INDEX closed_index ON tickets ((1)) WHERE closed IS NULL;CREATE INDEX closed_index ON tickets (id, title, created, updated)
WHERE closed IS NULL;
VACUUM ANALYZE tickets; -- just to prove idx-only is possible
SELECT id, title, created, updated
, NULL::timestamp AS closed -- redundant, rather drop it
FROM tickets
WHERE closed IS NULL;Context
StackExchange Database Administrators Q#229924, answer score: 11
Revisions (0)
No revisions yet.