snippetsqlModerate
How to index a query with `WHERE field IS NULL`?
Viewed 0 times
fieldwithnullquerywherehowindex
Problem
I have a table with lots of inserts, setting one of the fields (
How should I index the table?
I understand that I should use a partial index like:
Or smth like that. I'm a bit confused though if it is correct to index on a field that is always
I've experimented a bit with the following indices:
and the query planer seems to always choose the
vs
uploaded_at) to NULL. Then a periodic task selects all the tuples WHERE uploaded_at IS NULL, processes them and updates, setting uploaded_at to current date.How should I index the table?
I understand that I should use a partial index like:
CREATE INDEX foo ON table (uploaded_at) WHERE uploaded_at IS NULLOr smth like that. I'm a bit confused though if it is correct to index on a field that is always
NULL. Or if it is correct to use a b-tree index. Hash looks like a better idea, but it is obsolete and is not replicated via streaming hot-standby replication. Any advice would be greatly appreciated.I've experimented a bit with the following indices:
"foo_part" btree (uploaded_at) WHERE uploaded_at IS NULL
"foo_part_id" btree (id) WHERE uploaded_at IS NULLand the query planer seems to always choose the
foo_part index. explain analyse also yields slightly better result for the foo_part index:Index Scan using foo_part on t1 (cost=0.28..297.25 rows=4433 width=16) (actual time=0.025..3.649 rows=4351 loops=1)
Index Cond: (uploaded_at IS NULL)
Total runtime: 4.060 msvs
Bitmap Heap Scan on t1 (cost=79.15..6722.83 rows=4433 width=16) (actual time=1.032..4.717 rows=4351 loops=1)
Recheck Cond: (uploaded_at IS NULL)
-> Bitmap Index Scan on foo_part_id (cost=0.00..78.04 rows=4433 width=0) (actual time=0.649..0.649 rows=4351 loops=1)
Total runtime: 5.131 msSolution
In this special case the column actually indexed is irrelevant for the query at hand. You can pick any column. I would pick something else than
If you have no use case for any other column, it's still best to stick with the useless
Or use a constant as index expression if you have no use for any other index column. Like:
CREATE INDEX baz ON table bar ((TRUE)) WHERE uploaded_at IS NULL;
Parentheses required. This also keeps the index at minimum size. But while the index column is never bigger than 8 bytes (which is the case for
Update: Storage characteristics change in Postgres 13 with index deduplication (while the rest still applies). See:
Related:
uploaded_at, which is useless. Some column that may be useful for other queries and is not bigger than 8 bytes, ideally.CREATE INDEX foo ON table bar (some_col) WHERE uploaded_at IS NULL;If you have no use case for any other column, it's still best to stick with the useless
uploaded_at, so not to introduce additional maintenance cost for the index and restrictions for H.O.T. updates. More:- Is a composite index also good for queries on the first field?
Or use a constant as index expression if you have no use for any other index column. Like:
CREATE INDEX baz ON table bar ((TRUE)) WHERE uploaded_at IS NULL;
Parentheses required. This also keeps the index at minimum size. But while the index column is never bigger than 8 bytes (which is the case for
timestamp) it's still at minimum size anyway.Update: Storage characteristics change in Postgres 13 with index deduplication (while the rest still applies). See:
- Is a composite index also good for queries on the first field?
Related:
- Get count estimates from pg_class.reltuples for given conditions
Code Snippets
CREATE INDEX foo ON table bar (some_col) WHERE uploaded_at IS NULL;Context
StackExchange Database Administrators Q#81620, answer score: 13
Revisions (0)
No revisions yet.