HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlModerate

How to index a query with `WHERE field IS NULL`?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fieldwithnullquerywherehowindex

Problem

I have a table with lots of inserts, setting one of the fields (uploaded_at) to NULL. Then a periodic task selects all the tuples WHERE uploaded_at IS NULL, processes them and updates, setting uploaded_at to current date.

How should I index the table?

I understand that I should use a partial index like:

CREATE INDEX foo ON table (uploaded_at) WHERE uploaded_at IS NULL


Or smth like that. I'm a bit confused though if it is correct to index on a field that is always NULL. Or if it is correct to use a b-tree index. Hash looks like a better idea, but it is obsolete and is not replicated via streaming hot-standby replication. Any advice would be greatly appreciated.

I've experimented a bit with the following indices:

"foo_part" btree (uploaded_at) WHERE uploaded_at IS NULL
"foo_part_id" btree (id) WHERE uploaded_at IS NULL


and the query planer seems to always choose the foo_part index. explain analyse also yields slightly better result for the foo_part index:

Index Scan using foo_part on t1  (cost=0.28..297.25 rows=4433 width=16) (actual time=0.025..3.649 rows=4351 loops=1)
   Index Cond: (uploaded_at IS NULL)
 Total runtime: 4.060 ms


vs

Bitmap Heap Scan on t1  (cost=79.15..6722.83 rows=4433 width=16) (actual time=1.032..4.717 rows=4351 loops=1)
   Recheck Cond: (uploaded_at IS NULL)
   ->  Bitmap Index Scan on foo_part_id  (cost=0.00..78.04 rows=4433 width=0) (actual time=0.649..0.649 rows=4351 loops=1)
 Total runtime: 5.131 ms

Solution

In this special case the column actually indexed is irrelevant for the query at hand. You can pick any column. I would pick something else than uploaded_at, which is useless. Some column that may be useful for other queries and is not bigger than 8 bytes, ideally.

CREATE INDEX foo ON table bar (some_col) WHERE uploaded_at IS NULL;


If you have no use case for any other column, it's still best to stick with the useless uploaded_at, so not to introduce additional maintenance cost for the index and restrictions for H.O.T. updates. More:

  • Is a composite index also good for queries on the first field?



Or use a constant as index expression if you have no use for any other index column. Like:
CREATE INDEX baz ON table bar ((TRUE)) WHERE uploaded_at IS NULL;

Parentheses required. This also keeps the index at minimum size. But while the index column is never bigger than 8 bytes (which is the case for timestamp) it's still at minimum size anyway.

Update: Storage characteristics change in Postgres 13 with index deduplication (while the rest still applies). See:

  • Is a composite index also good for queries on the first field?



Related:

  • Get count estimates from pg_class.reltuples for given conditions

Code Snippets

CREATE INDEX foo ON table bar (some_col) WHERE uploaded_at IS NULL;

Context

StackExchange Database Administrators Q#81620, answer score: 13

Revisions (0)

No revisions yet.