patternsqlMinor
PostgreSQL performance with (col = value or col is NULL)
Viewed 0 times
postgresqlwithnullcolvalueperformance
Problem
This question deals with PostgreSQL 9.5 query performance.
The table is:
Consider the query:
The goal is to make this query performance as good as possible while not using a union. I achieved the best performance using the index:
which makes the query use the index twice and use bitmap or between them. This is still much worse than if I could do
This makes me consider to put some special value such as -1 instead of NULL which will allow me to do
So is using some special value instead of NULL better in such situations? Or does PostgreSQL has some optimization for situation where you want to do an "OR" between a non-NULL value and a NULL?
I read this great article but didn't see there a solution for what I just asked without using a union: https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/
I also searched for such question on Stack Overflow and here but the related questions either don't talk about performance or don't talk about non-NULL together with NULL. This similar question (and the answer it received) spoke about SQL Server and not PostgreSQL: Best way to write SQL Query that checks a column for non-NULL value or NULL
I also considered doing coalesce to get rid of null but have read that it makes the performance much worse.
The table has more than 4 million records. From them only 56 records have NULL in val column.
UPDATE:
Original explain:
```
"Limit (cost=1112.32..1112.32 rows=2 width=519)"
The table is:
CREATE TABLE big_table
(
id integer NOT NULL,
flag bigint NOT NULL,
time timestamp with timezone NOT NULL,
val int,
primary key (id)
)Consider the query:
SELECT *
FROM big_table
WHERE time > '0666-06-06 00:00:00+00' AND
flag & 2 = 2 AND
(val = 5 or val IS NULL)
ORDER by time descThe goal is to make this query performance as good as possible while not using a union. I achieved the best performance using the index:
CREATE INDEX ind
ON big_table USING btree (time DESC, val)
WHERE (flag & 2::smallint = 2::smallint);which makes the query use the index twice and use bitmap or between them. This is still much worse than if I could do
val = ANY('{NULL,5}') which is not possible. This makes me consider to put some special value such as -1 instead of NULL which will allow me to do
val = ANY('{-1,5}') to run a full index scan with better performance.So is using some special value instead of NULL better in such situations? Or does PostgreSQL has some optimization for situation where you want to do an "OR" between a non-NULL value and a NULL?
I read this great article but didn't see there a solution for what I just asked without using a union: https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/
I also searched for such question on Stack Overflow and here but the related questions either don't talk about performance or don't talk about non-NULL together with NULL. This similar question (and the answer it received) spoke about SQL Server and not PostgreSQL: Best way to write SQL Query that checks a column for non-NULL value or NULL
I also considered doing coalesce to get rid of null but have read that it makes the performance much worse.
The table has more than 4 million records. From them only 56 records have NULL in val column.
UPDATE:
Original explain:
```
"Limit (cost=1112.32..1112.32 rows=2 width=519)"
Solution
Replacing
I expect a much greater impact from this: flip the order of index expressions:
Rule of thumb: index for equality first — then for ranges. See:
To your consolation:
Also consider upgrading to a current version of Postgres. 9.5 is getting old, there have been various performance improvements for big tables.
To return only few columns, an index-only scan would be a possible optimization, but you need to return most of your 21 columns according to comments.
To save 8 bytes per row needlessly lost to alignment padding, reorder the columns of your demo table like this:
Smaller is faster overall. Now, that's obviously just a demo table, but the same principles apply to your actual table. See:
Sort
For a single
Special case: very few
Since you mentioned only a handful of NULL values for millions of rows I would add a tailored index for that special case:
Maybe even append all other columns of interest to this very tiny special index to get an index-only scan. (Depends on preconditions.) Even more so in Postgres 11 or later with a true covering index using the
NULL with a surrogate value may or may not help a little. NULL handling is a bit more expensive in indexes but, on the other hand, it's typically smaller in storage, which also impacts performance.I expect a much greater impact from this: flip the order of index expressions:
CREATE INDEX ON big_table (val, time DESC)
WHERE flag & 2::smallint = 2::smallint;Rule of thumb: index for equality first — then for ranges. See:
- Multicolumn index and performance
To your consolation:
val = ANY('{-1,5}') burns down to being syntax shorthand for (val = -1 OR val = 5), which is hardly better than (val IS NULL OR val = 5). (The more important factor is number of rows for NULL vs. -1 - the same in your case if you just replace NULL with -1.).Also consider upgrading to a current version of Postgres. 9.5 is getting old, there have been various performance improvements for big tables.
To return only few columns, an index-only scan would be a possible optimization, but you need to return most of your 21 columns according to comments.
To save 8 bytes per row needlessly lost to alignment padding, reorder the columns of your demo table like this:
CREATE TABLE big_table (
flag bigint NOT NULL,
time timestamptz NOT NULL,
id int PRIMARY KEY,
val int
);Smaller is faster overall. Now, that's obviously just a demo table, but the same principles apply to your actual table. See:
- Configuring PostgreSQL for read performance
- Calculating and saving space in PostgreSQL
Sort
For a single
val, Postgres can return pre-sorted data from the index directly. But for more than one value it has to merge equally many sorted sub-sets (one sorted set for val IS NULL, another one for val = 5 in your example), so another sort step on top of index access is inevitable. Pre-sorted sets from the index still can make it cheaper - and you need sorted index tuples in any case. The actual query plan also depends on the chosen index access method. It's trivial to return pre-sorted data from an index scan (or index-only scan). Not so much for a bitmap index scan.Special case: very few
NULL values, used all the timeSince you mentioned only a handful of NULL values for millions of rows I would add a tailored index for that special case:
CREATE INDEX ON big_table (time DESC)
WHERE flag & 2::smallint = 2::smallint
AND val IS NULL;Maybe even append all other columns of interest to this very tiny special index to get an index-only scan. (Depends on preconditions.) Even more so in Postgres 11 or later with a true covering index using the
INCLUDE clause. Results from this and the other index are merged in a BitmapOr node, just like you see for multiple subset from the same index now. Postgres has precise estimates for the special case and it becomes completely irrelevant whether the special case is NULL or -1 or whatever. (Not that it mattered all that much to begin with.) See:- Can Postgres use an index-only scan for this query with joined tables?
- Index that is not used, yet influences query
Code Snippets
CREATE INDEX ON big_table (val, time DESC)
WHERE flag & 2::smallint = 2::smallint;CREATE TABLE big_table (
flag bigint NOT NULL,
time timestamptz NOT NULL,
id int PRIMARY KEY,
val int
);CREATE INDEX ON big_table (time DESC)
WHERE flag & 2::smallint = 2::smallint
AND val IS NULL;Context
StackExchange Database Administrators Q#226416, answer score: 4
Revisions (0)
No revisions yet.