HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

PostgreSQL How to DEFAULT Partitioned Identity Column?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlidentitycolumndefaulthowpartitioned

Problem

PostgreSQL 11

What is the best way to generate default values for identity columns on partition tables.

E.g

CREATE TABLE data.log
(
  id              BIGINT GENERATED ALWAYS AS IDENTITY
                  (
                    INCREMENT BY 1
                    MINVALUE -9223372036854775808
                    MAXVALUE 9223372036854775807
                    START WITH -9223372036854775808
                    RESTART WITH -9223372036854775808
                    CYCLE
                  ),
  epoch_millis    BIGINT NOT NULL,
  message         TEXT NOT NULL

) PARTITION BY RANGE (epoch_millis);

CREATE TABLE data.foo_log
PARTITION OF data.log
(
  PRIMARY KEY (id)
)
FOR VALUES FROM (0) TO (9999999999);


If I do:

INSERT INTO data.foo_log (epoch_millis, message)
VALUES (1000000, 'hello');


I get:


ERROR: null value in column "id" violates not-null constraint

DETAIL: Failing row contains (null, 1000000, hello).

SQL state: 23502

because the default generated value is not applied to the partition UNLESS I insert it into the root table like this:

INSERT INTO data.log (epoch_millis, message)
VALUES (1000000, 'hello');


There are times though that I want to insert directly into a specific partition for performance reasons (like doing bulk COPY).

The only way I can get this to work is to create the partition while knowing about the sequence that was implicitly created for the identity column like this:

CREATE TABLE data.foo_log
PARTITION OF data.log
(
  id DEFAULT nextval('data.log_id_seq'),
  PRIMARY KEY (id)
)
FOR VALUES FROM (0) TO (9999999999);


Is there a better way to do this and if so how?

Solution

I don't know of a better solution in general. A few minor things, though:
pg_get_serial_sequence()

If you don't know the name of the parent's implicit sequence, use pg_get_serial_sequence().

SELECT pg_get_serial_sequence('data.log', 'id');


You might even use the expression in the CREATE TABLE script directly, but that would impose a very minor additional cost to compute the actual name for the default (once per transaction, I think), and since this is about performance optimization ...
COPY overrides GENERATED ALWAYS, but triggers do not

Defining your id column as GENERATED ALWAYS AS IDENTITY has the effect that you are not allowed to provide user values for the column id in INSERT statements, unless adding an "override" clause like:
INSERT INTO data.log (epoch_millis, message) OVERRIDING USER VALUE
VALUES (1000000, 'hello');

The manual:

OVERRIDING USER VALUE

If this clause is specified, then any values supplied for identity
columns are ignored and the default sequence-generated values are
applied.

This clause is useful for example when copying values between tables.
Writing INSERT INTO tbl2 OVERRIDING USER VALUE SELECT * FROM tbl1
will copy from tbl1 all columns that are not identity columns in
tbl2 while values for the identity columns in tbl2 will be
generated by the sequences associated with tbl2.

COPY overrides in any case. The manual:

For identity columns, the COPY FROM command will always write the
column values provided in the input data, like the INSERT option
OVERRIDING SYSTEM VALUE.

But while writing to a partition directly with your solution, INSERT also overrides, so it will be your responsibility to avoid providing user values for the id column directly. An alternative would be to use a trigger instead of the default value in the partition:

CREATE OR REPLACE FUNCTION trg_log_default_id()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   NEW.id := nextval('data.log_id_seq')
   RETURN NEW;
END
$func$;

CREATE TRIGGER insbef_default_id
  BEFORE INSERT ON data.foo_log  -- the partition
  FOR EACH ROW
  EXECUTE PROCEDURE trg_log_default_id();


This assigns a number from the sequence in any case, more closely emulating the GENERATED ALWAYS behavior of the parent - stricter, even, also preventing COPY from violating your rule. The manual:

COPY FROM will invoke any triggers and check constraints on the destination table.

A trigger is a bit more expensive than a plain default value. And it burns an extra serial number per row for regular inserts via the parent table. (It should be possible to distinguish cases in the trigger, didn't try now.)

Code Snippets

SELECT pg_get_serial_sequence('data.log', 'id');
CREATE OR REPLACE FUNCTION trg_log_default_id()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   NEW.id := nextval('data.log_id_seq')
   RETURN NEW;
END
$func$;

CREATE TRIGGER insbef_default_id
  BEFORE INSERT ON data.foo_log  -- the partition
  FOR EACH ROW
  EXECUTE PROCEDURE trg_log_default_id();

Context

StackExchange Database Administrators Q#225392, answer score: 5

Revisions (0)

No revisions yet.