HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Compacting a sequence in PostgreSQL

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlcompactingsequence

Problem

I have a id serial PRIMARY KEY column in a PostgreSQL table. Many ids are missing because I have deleted the corresponding row.

Now I want to "compact" the table by restarting the sequence and reassigning the ids in such a way that the original id order is preserved. Is it possible?

Example:

  • Now:



id | data  
----+-------
  1 | hello
  2 | world
  4 | foo
  5 | bar


  • After:



id | data  
----+-------
  1 | hello
  2 | world
  3 | foo
  4 | bar


I tried what was suggested in a StackOverflow answer, but it didn't work:

# alter sequence t_id_seq restart;
ALTER SEQUENCE
# update t set id=default;
ERROR:  duplicate key value violates unique constraint t_pkey
DETAIL:  Key (id)=(1) already exists.

Solution

First off, gaps in a sequence are to be expected. Ask yourself if you really need to remove them. Your life gets simpler if you just live with it. To get gap-less numbers, the (often better) alternative is to use a VIEW with row_number(). Example in this related answer:

  • Gap-less sequence where multiple transactions with multiple tables are involved



Here are some recipes to remove gaps.
  1. New, pristine table



Avoids complications with unique violations and table bloat and is fast. Only for simple cases where you are not bound by FK references, views on the table or other depending objects, or by concurrent access. Do it in one transaction to avoid accidents:

BEGIN;
LOCK tbl; -- optionally: IN SHARE MODE to allow concurrent reads

CREATE TABLE tbl_new (LIKE tbl INCLUDING ALL);

INSERT INTO tbl_new -- no target list in this case
SELECT row_number() OVER (ORDER BY id), data  -- all columns in default order
FROM   tbl;

ALTER SEQUENCE tbl_id_seq OWNED BY tbl_new.id;  -- make new table own sequence

DROP TABLE tbl;
ALTER TABLE tbl_new RENAME TO tbl;

SELECT setval('tbl_id_seq', max(id)) FROM tbl;  -- reset sequence

COMMIT;


CREATE TABLE tbl_new (LIKE tbl INCLUDING ALL) copies the structure incl. constraints and defaults from the original table. Then make the new table column own the sequence:

  • Activerecord-import & serial column in PostgreSQL



And reset it to the new maximum:

  • How to reset Postgres' primary key sequence when it falls out of sync?



This carries the advantage that the new table is bloat-free and clustered on id.
  1. UPDATE in place



This produces a lot of dead rows and requires (auto-) VACUUM later.

If the serial column is also the PRIMARY KEY (like in your case) or has a UNIQUE constraint, you must avoid unique violations in the process. The (cheaper) default for PK / UNIQUE constraints is to be NOT DEFERRABLE, which forces a check after every single single row. All details under this related question on SO:

  • Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?



You could define your constraint as DEFERRABLE (which makes it more expensive).

Or you can drop the constraint and add it back when you are done:

BEGIN;

LOCK tbl;

ALTER TABLE tbl DROP CONSTRAINT tbl_pkey;  -- remove PK

UPDATE tbl t  -- intermediate unique violations are ignored now
SET    id = t1.new_id
FROM  (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE  t.id = t1.id;

SELECT setval('tbl_id_seq', max(id)) FROM tbl;  -- reset sequence

ALTER TABLE tbl ADD CONSTRAINT tbl_pkey PRIMARY KEY(id); -- add PK back

COMMIT;


Neither is possible while you have FOREIGN KEY constraints referencing the column(s) because (per documentation):

The referenced columns must be the columns of a non-deferrable unique
or primary key constraint in the referenced table.

You would need to (lock all involved tables and) drop / recreate FK constraints and update all FK values manually (see option 3.). Or you have to move values out of the way with a second UPDATE to avoid conflicts. For instance, assuming you have no negative numbers:

BEGIN;
LOCK tbl;

UPDATE tbl SET id = id * -1;  -- avoid conflicts

UPDATE tbl t
SET    id = t1.new_id
FROM  (SELECT id, row_number() OVER (ORDER BY id DESC) AS new_id FROM tbl) t1
WHERE  t.id = t1.id;

SELECT setval('tbl_id_seq', max(id)) FROM tbl;  -- reset sequence

COMMIT;


Drawbacks as mentioned above.
  1. Temp table, TRUNCATE, INSERT



One more option if you have plenty of RAM. This combines some of the advantages of the first two ways. Almost as fast as option 1. and you get a pristine, new table without bloat but keep all constraints and dependencies in place like in option 2.

However, per documentation:

TRUNCATE cannot be used on a table that has foreign-key references
from other tables, unless all such tables are also truncated in the
same command. Checking validity in such cases would require table
scans, and the whole point is not to do one.

Bold emphasis mine.

You could drop FK constraints temporarily and use data-modifying CTEs to update all FK columns:

SET temp_buffers = 500MB;   -- example value, see 1st link below

BEGIN;

CREATE TEMP TABLE tbl_tmp AS
SELECT row_number() OVER (ORDER BY id) AS new_id, *
FROM   tbl
ORDER  BY id;  -- order here to use index (if one exists)

-- drop FK constraints in other tables referencing this one
-- which takes out an exclusive lock on those tables

TRUNCATE tbl;

INSERT INTO tbl
SELECT new_id, data  -- list all columns in order
FROM tbl_tmp;        -- rely on established order in tbl_tmp
-- ORDER BY id;      -- only to be absolutely sure (not necessary)

--  example for table "fk_tbl" with FK column "fk_id"
UPDATE fk_tbl f
SET    fk_id = t.new_id  -- set to new ID
FROM   tbl_tmp t
WHERE  f.fk_id = t.id;   -- match on old ID

-- add FK constraints in other tables back

COMMIT;


Related, with more details:

  • How to delete duplicate entries?



  • How to “merge

Code Snippets

BEGIN;
LOCK tbl; -- optionally: IN SHARE MODE to allow concurrent reads

CREATE TABLE tbl_new (LIKE tbl INCLUDING ALL);

INSERT INTO tbl_new -- no target list in this case
SELECT row_number() OVER (ORDER BY id), data  -- all columns in default order
FROM   tbl;

ALTER SEQUENCE tbl_id_seq OWNED BY tbl_new.id;  -- make new table own sequence

DROP TABLE tbl;
ALTER TABLE tbl_new RENAME TO tbl;

SELECT setval('tbl_id_seq', max(id)) FROM tbl;  -- reset sequence

COMMIT;
BEGIN;

LOCK tbl;

ALTER TABLE tbl DROP CONSTRAINT tbl_pkey;  -- remove PK

UPDATE tbl t  -- intermediate unique violations are ignored now
SET    id = t1.new_id
FROM  (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE  t.id = t1.id;

SELECT setval('tbl_id_seq', max(id)) FROM tbl;  -- reset sequence

ALTER TABLE tbl ADD CONSTRAINT tbl_pkey PRIMARY KEY(id); -- add PK back

COMMIT;
BEGIN;
LOCK tbl;

UPDATE tbl SET id = id * -1;  -- avoid conflicts

UPDATE tbl t
SET    id = t1.new_id
FROM  (SELECT id, row_number() OVER (ORDER BY id DESC) AS new_id FROM tbl) t1
WHERE  t.id = t1.id;

SELECT setval('tbl_id_seq', max(id)) FROM tbl;  -- reset sequence

COMMIT;
SET temp_buffers = 500MB;   -- example value, see 1st link below

BEGIN;

CREATE TEMP TABLE tbl_tmp AS
SELECT row_number() OVER (ORDER BY id) AS new_id, *
FROM   tbl
ORDER  BY id;  -- order here to use index (if one exists)

-- drop FK constraints in other tables referencing this one
-- which takes out an exclusive lock on those tables

TRUNCATE tbl;

INSERT INTO tbl
SELECT new_id, data  -- list all columns in order
FROM tbl_tmp;        -- rely on established order in tbl_tmp
-- ORDER BY id;      -- only to be absolutely sure (not necessary)

--  example for table "fk_tbl" with FK column "fk_id"
UPDATE fk_tbl f
SET    fk_id = t.new_id  -- set to new ID
FROM   tbl_tmp t
WHERE  f.fk_id = t.id;   -- match on old ID

-- add FK constraints in other tables back

COMMIT;

Context

StackExchange Database Administrators Q#111823, answer score: 11

Revisions (0)

No revisions yet.