HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

PostgreSQL UPSERT issue with NULL values

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlwithnullissueupsertvalues

Problem

I'm having an issue with using the new UPSERT feature in Postgres 9.5

I have a table that is used for aggregating data from another table. The composite key is made up of 20 columns, 10 of which can be nullable.
Below I have created a smaller version of the issue i'm having, specifically with NULL values.

CREATE TABLE public.test_upsert (
upsert_id serial,
name character varying(32) NOT NULL,
status integer NOT NULL,
test_field text,
identifier character varying(255),
count integer,
CONSTRAINT upsert_id_pkey PRIMARY KEY (upsert_id),
CONSTRAINT test_upsert_name_status_test_field_key UNIQUE (name, status, test_field)
);


Running this query works as needed (First insert, then subsequent inserts simply increment the count):

INSERT INTO test_upsert as tu(name,status,test_field,identifier, count) 
VALUES ('shaun',1,'test value','ident', 1)
ON CONFLICT (name,status,test_field) DO UPDATE set count = tu.count + 1 
where tu.name = 'shaun' AND tu.status = 1 AND tu.test_field = 'test value';


However if I run this query, 1 row is inserted each time rather than incrementing the count for the initial row:

INSERT INTO test_upsert as tu(name,status,test_field,identifier, count) 
VALUES ('shaun',1,null,'ident', 1)
ON CONFLICT (name,status,test_field) DO UPDATE set count = tu.count + 1  
where tu.name = 'shaun' AND tu.status = 1 AND tu.test_field = null;


This is my issue. I need to simply increment the count value and not create multiple identical rows with null values.

Attempting to add a partial unique index:

CREATE UNIQUE INDEX test_upsert_upsert_id_idx
ON public.test_upsert
USING btree
(name COLLATE pg_catalog."default", status, test_field, identifier);


However, this yields the same results, either multiple null rows being inserted or this error message when trying to insert:


ERROR: there is no unique or exclusion constraint matching the ON
CONFLICT specification

I already attempted to add extra details on the partial index such as

Solution

Clarify ON CONFLICT DO UPDATE behavior

Consider the manual here:

For each individual row proposed for insertion, either the insertion
proceeds, or, if an arbiter constraint or index specified by
conflict_target is violated, the alternative conflict_action is taken.

Bold emphasis mine. So you do not have to repeat predicates for columns included in the unique index in the WHERE clause to the UPDATE (the conflict_action):

INSERT INTO test_upsert AS tu
       (name   , status, test_field  , identifier, count) 
VALUES ('shaun', 1     , 'test value', 'ident'   , 1)
ON CONFLICT (name, status, test_field) DO UPDATE
SET count = tu.count + 1;
WHERE tu.name = 'shaun' AND tu.status = 1 AND tu.test_field = 'test value'


The unique violation already establishes what your added WHERE clause would enforce redundantly.
Clarify partial index

Add a WHERE clause to make it an actual partial index like you mentioned yourself (but with inverted logic):

CREATE UNIQUE INDEX test_upsert_partial_idx
ON public.test_upsert (name, status)
WHERE test_field IS NULL;  -- not: "is not null"


To use this partial index in your UPSERT you need a matching conflict_target like @ypercube demonstrates:

ON CONFLICT (name, status) WHERE test_field IS NULL


Now the above partial index is inferred. However, as the manual also notes:

[...] a non-partial unique index (a unique index without a predicate) will
be inferred (and thus used by ON CONFLICT) if such an index satisfying
every other criteria is available.

If you have an additional (or only) index on just (name, status) it will (also) be used. An index on (name, status, test_field) would explicitly not be inferred. This doesn't explain your problem, but may have added to the confusion while testing.
Solutions

AIUI, none of the above solves your problem, yet. With the partial index, only special cases with matching NULL values would be caught. And other duplicate rows would either be inserted if you have no other matching unique indexes / constraints, or raise an exception if you do. I suppose that's not what you want. You write:

The composite key is made up of 20 columns, 10 of which can be nullable.

What exactly do you consider a duplicate? Postgres (according to the SQL standard) does not consider two NULL values to be equal. The manual:

In general, a unique constraint is violated if there is more than one
row in the table where the values of all of the columns included in
the constraint are equal. However, two null values are never
considered equal in this comparison. That means even in the presence
of a unique constraint it is possible to store duplicate rows that
contain a null value in at least one of the constrained columns. This
behavior conforms to the SQL standard, but we have heard that other
SQL databases might not follow this rule. So be careful when
developing applications that are intended to be portable.

Related:

  • Allow null in unique column



I assume you want NULL values in all 10 nullable columns to be considered equal.
Update for Postgres 15 or newer

This is simple now. See:

  • Create unique constraint with null columns



So:

CREATE UNIQUE INDEX test_upsert_solution_idx
    ON test_upsert (name, status, test_field) NULLS NOT DISTINCT;


For many columns, the index grows big. To keep the size down and performance up, consider an index on the hash value of the combined columns, see:

  • Why doesn't my UNIQUE constraint trigger?



Postgres 14 or older

It is elegant & practical to cover a single nullable column with an additional partial index like demonstrated here:

  • PostgreSQL multi-column unique constraint and NULL values



But this gets out of hand quickly for more nullable columns. You'd need a partial index for every distinct combination of nullable columns. For just 2 of those that's 3 partial indexes for (a), (b) and (a,b). The number is growing exponentially with 2^n - 1. For your 10 nullable columns, to cover all possible combinations of NULL values, you'd already need 1023 partial indexes. No go.

The simple solution: replace NULL values and define involved columns NOT NULL, and everything would work just fine with a simple UNIQUE constraint.

If that's not an option I suggest an expression index with COALESCE to replace NULL in the index:

CREATE UNIQUE INDEX test_upsert_solution_idx
    ON test_upsert (name, status, COALESCE(test_field, ''));


The empty string ('') is an obvious candidate for character types, but you can use any legal value that either never appears or can be folded with NULL according to your definition of "unique".

Then use this statement:

```
INSERT INTO test_upsert as tu(name,status,test_field,identifier, count)
VALUES ('shaun', 1, null , 'ident', 11) -- works with
, ('bob' , 2, 'test value', 'ident', 22) -- and without NULL
ON CONFLICT (name, status, COALESCE(test_field, '')) DO UPDATE -- match expr. index
SET count = COALE

Code Snippets

INSERT INTO test_upsert AS tu
       (name   , status, test_field  , identifier, count) 
VALUES ('shaun', 1     , 'test value', 'ident'   , 1)
ON CONFLICT (name, status, test_field) DO UPDATE
SET count = tu.count + 1;
WHERE tu.name = 'shaun' AND tu.status = 1 AND tu.test_field = 'test value'
CREATE UNIQUE INDEX test_upsert_partial_idx
ON public.test_upsert (name, status)
WHERE test_field IS NULL;  -- not: "is not null"
ON CONFLICT (name, status) WHERE test_field IS NULL
CREATE UNIQUE INDEX test_upsert_solution_idx
    ON test_upsert (name, status, test_field) NULLS NOT DISTINCT;
CREATE UNIQUE INDEX test_upsert_solution_idx
    ON test_upsert (name, status, COALESCE(test_field, ''));

Context

StackExchange Database Administrators Q#151431, answer score: 34

Revisions (0)

No revisions yet.