HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Adding a Primary Key to a Large PostgreSQL Table with High Traffic

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlprimarywithtraffichighaddinglargetablekey

Problem

I need to add a primary key to a large PostgreSQL table (approximately 2TB) with high traffic. This is a critical operation, and I'm looking for guidance on how to do it efficiently.

I have already tried below steps:
-- Step 1: Add id identity column
ALTER TABLE users
ADD COLUMN id BIGINT GENERATED ALWAYS as IDENTITY;

-- Step 2: Add unique index on (id, user_id) concurrently
CREATE UNIQUE INDEX CONCURRENTLY id_user_id_idx
ON users (id, user_id);

-- verify that step 2 is completed
-- Step 3: Add primary key
ALTER TABLE users
ADD CONSTRAINT users_pkey PRIMARY KEY USING INDEX id_user_id_idx;


I have facing two problems:

-
Table is locked completely on "Step 1" itself.

I know this is expected, but if there is any option to avoid that please suggest.

-
I get below error,

ERROR: could not extend file "base/16401/90996": No space left on device
HINT: Check free disk space.

But I have around 600GB of storage left on my server.

As the table will be locked on "Step 1", and if there is no option to avoid that, I could take the downtime and add the id column first and then run the other two scripts.

I don't know whether that would solve the storage error.

Please provide any suggestions so that I can add the PK with the least downtime possible.

PostgreSQL v14.6

Solution

Why?

Your step 1 would need a lot more than 600 GB (temporarily). The table has around 2 TB. About as much (minus possible bloat, plus 8 bytes per row for the new bigint column) has to be available at least, because that change forces Postgres to rewrite the whole table.
Minimize blocking AND minimize total storage requirement

Do instead, in this order:

fiddle

Add a nullable column id with no default value, so it will be null initially.

ALTER TABLE users ADD COLUMN id bigint;


This way, Postgres can make do with tiny metadata changes. No table rewrite, no blocking.

I would name the PK column "user_id", not a fan of the wide-spread, non-descriptive, and highly duplicative name "id". But keeping "id" to stay in line with the question.

Create a SEQUENCE manually:

CREATE SEQUENCE users_id_seq;


Make the column "own" the sequence:

ALTER SEQUENCE users_id_seq OWNED BY users.id;


Add the column default, which only kicks in for new rows.

ALTER TABLE users ALTER COLUMN id SET DEFAULT nextval('users_id_seq');


See:

  • Safely rename tables using serial primary key columns



  • Creating a PostgreSQL sequence to a field (which is not the ID of the record)



Update pre-existing rows (still with null values) in batches of like 1 % of the total size (or whatever). In separate transactions, to allow autovacuum to kick in and mark dead rows for reuse. This way, the table won't grow much, and 600 GB are easily enough wiggle room.

Since the addition of SQL procedures in Postgres 11, we can COMMIT in an anonymous code block. Assuming a timestamptz column users.inserted_at (ideally with an index on it!) something like this would work:

DO
$do$
DECLARE
   _ts timestamptz := (SELECT COALESCE(min(inserted_at), now()) FROM users);  -- must not be NULL
   _step interval  := '7 days';  -- adjust to your data !!!
BEGIN
   LOOP
      RAISE NOTICE 'Updating rows starting from %', _ts;  -- optional
      
      UPDATE users
      SET    id = nextval('users_id_seq')
      WHERE  inserted_at >= _ts
      AND    inserted_at = now();  -- adjust to your case !!!

      COMMIT;  -- Requires Postgres 11+ !!!
      PERFORM pg_sleep(10);  -- adapt to your setup: long enough so let autovacuum kick in
      _ts := _ts + _step;
   END LOOP;
END
$do$;


Alternatively, loop in your client, and run VACUUM users; between iterations to make sure space is reused. (VACUUM cannot run inside a transaction.)

See:

  • COMMIT works in one plpgsql code block, but not in another?



  • Do PL/pgSQL functions autocommit on their own?



Eventually, all old rows are updated.

Now create the unique index CONCURRENTLY, to avoid blocking inserts. Like your step 2, but only on (id):

CREATE UNIQUE INDEX CONCURRENTLY users_id_idx ON users (id);


I don't see a good reason to add user_id to the PK. If you need it for index-only scans consider a covering index with INCLUDE (user_id). But that's not always beneficial overall. See:

  • Does a query with a primary key and foreign keys run faster than a query with just primary keys?



  • Do covering indexes in PostgreSQL help JOIN columns?



Now use the unique index to add the new PK without blocking inserts (your step 3):

ALTER TABLE users ADD CONSTRAINT users_pkey PRIMARY KEY USING INDEX users_id_idx;


This will also implicitly set the column NOT NULL.

Finally, use Peter Eisentraut's function upgrade_serial_to_identity(tbl regclass, col name) to convert the serial to an IDENTITY column. As superuser:

SELECT upgrade_serial_to_identity('users'::regclass, 'id')


Or stick with the serial PK, might be good enough.

Related:

  • How to change a table ID from serial to identity?

Code Snippets

ALTER TABLE users ADD COLUMN id bigint;
CREATE SEQUENCE users_id_seq;
ALTER SEQUENCE users_id_seq OWNED BY users.id;
ALTER TABLE users ALTER COLUMN id SET DEFAULT nextval('users_id_seq');
DO
$do$
DECLARE
   _ts timestamptz := (SELECT COALESCE(min(inserted_at), now()) FROM users);  -- must not be NULL
   _step interval  := '7 days';  -- adjust to your data !!!
BEGIN
   LOOP
      RAISE NOTICE 'Updating rows starting from %', _ts;  -- optional
      
      UPDATE users
      SET    id = nextval('users_id_seq')
      WHERE  inserted_at >= _ts
      AND    inserted_at <  _ts + _step
      AND    id IS NULL;  -- optional

      EXIT WHEN NOT FOUND AND _ts >= now();  -- adjust to your case !!!

      COMMIT;  -- Requires Postgres 11+ !!!
      PERFORM pg_sleep(10);  -- adapt to your setup: long enough so let autovacuum kick in
      _ts := _ts + _step;
   END LOOP;
END
$do$;

Context

StackExchange Database Administrators Q#331424, answer score: 12

Revisions (0)

No revisions yet.