patternsqlMajor
Why doesn't my UNIQUE constraint trigger?
Viewed 0 times
uniquewhytriggerdoesnconstraint
Problem
I have this
Then I do:
Both work. Two rows are added to the table. The second is logically supposed to fail. But it doesn't.
What am I doing wrong? This is driving me insane.
Note: If this were my own data, I would have a truly unique column and not this "crazy"
UNIQUE constraint:ALTER TABLE table ADD CONSTRAINT "abc123" UNIQUE
("col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8");Then I do:
INSERT INTO table ("col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8")
VALUES ('a', 'b', 'c', 'd', 'e', 'f', null, true);
INSERT INTO table ("col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8")
VALUES ('a', 'b', 'c', 'd', 'e', 'f', null, true);Both work. Two rows are added to the table. The second is logically supposed to fail. But it doesn't.
What am I doing wrong? This is driving me insane.
Note: If this were my own data, I would have a truly unique column and not this "crazy"
UNIQUE constraint. The issue is that this table holds the records from my bank account, and they stupidly don't have an actually "unique" column in their CSV dump which I could use to actually make sure that duplicate rows aren't inserted, so I have to come up with one which combines all the columns in the entire table to determine uniqueness.Solution
NULL is the culprit, because two NULL values are considered distinct in a UNIQUE constraint - in accordance with the SQL standard.Postgres 15 or newer
Postgres 15 adds an option to change this behavior, allowing for a simple solution:
ALTER TABLE table ADD CONSTRAINT "abc123" UNIQUE NULLS NOT DISTINCT
(col1, col2, col3, col4, col5, col6, col7, col8);See:
- Create unique constraint with null columns
This works out of the box now.
However, the underlying unique index is big and inefficient for many and/or wide columns. I would still consider an index on a hash value like outlined below.
Alternative solutions (original answer)
Do you need all columns to make rows unique? Typically, combining just a few should suffice. Bank data should have plenty of notnull columns ...
To make it work including a single nullable column, you could use a partial index as outlined here:
- Create unique constraint with null columns
But that gets impractical quickly with more than one nullable column.
With multiple nullable columns, a simple solution would be a unique expression index with
COALESCE like:CREATE UNIQUE INDEX bank_uni_idx ON bank
(col1, col2, COALESCE(col3, ''), col4, col5, col6, COALESCE(col7, ''), col8);That's assuming
col3 & col7 are nullable string type columns, where the empty string ('') and NULL are semantically equivalent.The same can be used for a single nullable column as well, obviously.
You need a safe replacement for
NULL that won't conflict with other legal values (empty string in my example).The downside of all solutions so far (including your original) is the large index on so many columns. Can make it rather expensive. This leads me to the answer I really want to give:
Efficient solution
Create a
UNIQUE index or constraint based on a cheap and sufficiently unique hash value of the row (reduced to defining columns).Postgres 14
comes with a built-in hash function for records (including anonymous records!), which is substantially cheaper than my custom function below.
hash_record_extended(record, bigint) --> bigintSee:
- Generate hash id for records in DB
It belongs to the same family of functions as
hashtextextended() ( details below). Now, an expression index seems more attractive than a generated column. So just:CREATE UNIQUE INDEX bank_hash_uni ON bank (hash_record_extended((col1, col2, col3, col4, col5, col6, col7, col8),0));That's all. Most of the below still applies.
Postgres 13 (original answer)
Store the hash value in a generated column and create a
UNIQUE constraint on that. See:- Computed / calculated / virtual / derived columns in PostgreSQL
Assuming all
text columns.CREATE OR REPLACE FUNCTION public.f_bank_bighash(col1 text, col2 text, col3 text, col4 text
, col5 text, col6 text, col7 text, col8 text)
RETURNS bigint
LANGUAGE sql IMMUTABLE COST 25 PARALLEL SAFE AS
'SELECT hashtextextended(textin(record_out(($1,$2,$3,$4,$5,$6,$7,$8))), 0)';
COMMENT ON FUNCTION public.f_bank_bighash(text, text, text, text, text, text, text, text)
IS 'Fast, practically unique signature for the set of defining columns in table bank.
IMMUTABLE for use in index. "record_out"() is only stable, but with only text input it is effectively immutable.';
ALTER TABLE bank
ADD COLUMN bank_bighash bigint NOT NULL GENERATED ALWAYS AS (public.f_bank_bighash(col1, col2, col3, col4, col5, col6, col7, col8)) STORED -- appends column in last position
, ADD CONSTRAINT bank_bighash_uni UNIQUE (bank_bighash);db<>fiddle here
Works with
NULL values.Requires Postgres 12 or later, where extended hash functions and generated columns were added.
hashtextextended() as well as hastext() are internal functions used for fast and reliable hashing for hash partition or hash indexes. They are undocumented. But they are not going away.They may not be stable across different hardware platforms, as Tom Lane points out. Recreate hashes after moving your DB cluster from a little-endian to a big-endian system (if something like that should ever happen.)
The second argument for
hashtextextended() is a salt for the hash. Use any bigint constant, just make sure to use the same everywhere. Stick with 0, unless you know better.Also, while hash collisions are extremely unlikely with the vast bigint key space, the theoretical possibility is always there. If that happens, you get a unique violation for two distinct rows. If uncomfortable with that, use
md5() instead, and store uuid values. See:- What is the optimal data type for an MD5 field?
16 bytes for
uuid instead of 8 bytes for bigint. A bit more expensive to compute, store, and compare. Collisions are still theoretically possible, but you'd have to be paranoid.Older (or any) versions could do the same with
hashtext() returning integer. Makes collisions much more likely. Still very unlikely up to a couple oCode Snippets
ALTER TABLE table ADD CONSTRAINT "abc123" UNIQUE NULLS NOT DISTINCT
(col1, col2, col3, col4, col5, col6, col7, col8);CREATE UNIQUE INDEX bank_uni_idx ON bank
(col1, col2, COALESCE(col3, ''), col4, col5, col6, COALESCE(col7, ''), col8);hash_record_extended(record, bigint) --> bigintCREATE UNIQUE INDEX bank_hash_uni ON bank (hash_record_extended((col1, col2, col3, col4, col5, col6, col7, col8),0));CREATE OR REPLACE FUNCTION public.f_bank_bighash(col1 text, col2 text, col3 text, col4 text
, col5 text, col6 text, col7 text, col8 text)
RETURNS bigint
LANGUAGE sql IMMUTABLE COST 25 PARALLEL SAFE AS
'SELECT hashtextextended(textin(record_out(($1,$2,$3,$4,$5,$6,$7,$8))), 0)';
COMMENT ON FUNCTION public.f_bank_bighash(text, text, text, text, text, text, text, text)
IS 'Fast, practically unique signature for the set of defining columns in table bank.
IMMUTABLE for use in index. "record_out"() is only stable, but with only text input it is effectively immutable.';
ALTER TABLE bank
ADD COLUMN bank_bighash bigint NOT NULL GENERATED ALWAYS AS (public.f_bank_bighash(col1, col2, col3, col4, col5, col6, col7, col8)) STORED -- appends column in last position
, ADD CONSTRAINT bank_bighash_uni UNIQUE (bank_bighash);Context
StackExchange Database Administrators Q#299098, answer score: 24
Revisions (0)
No revisions yet.