patternsqlModerate
Compacting a sequence in PostgreSQL
Viewed 0 times
postgresqlcompactingsequence
Problem
I have a
Now I want to "compact" the table by restarting the sequence and reassigning the
Example:
I tried what was suggested in a StackOverflow answer, but it didn't work:
id serial PRIMARY KEY column in a PostgreSQL table. Many ids are missing because I have deleted the corresponding row.Now I want to "compact" the table by restarting the sequence and reassigning the
ids in such a way that the original id order is preserved. Is it possible?Example:
- Now:
id | data
----+-------
1 | hello
2 | world
4 | foo
5 | bar- After:
id | data
----+-------
1 | hello
2 | world
3 | foo
4 | barI tried what was suggested in a StackOverflow answer, but it didn't work:
# alter sequence t_id_seq restart;
ALTER SEQUENCE
# update t set id=default;
ERROR: duplicate key value violates unique constraint t_pkey
DETAIL: Key (id)=(1) already exists.Solution
First off, gaps in a sequence are to be expected. Ask yourself if you really need to remove them. Your life gets simpler if you just live with it. To get gap-less numbers, the (often better) alternative is to use a
Here are some recipes to remove gaps.
Avoids complications with unique violations and table bloat and is fast. Only for simple cases where you are not bound by FK references, views on the table or other depending objects, or by concurrent access. Do it in one transaction to avoid accidents:
And reset it to the new maximum:
This carries the advantage that the new table is bloat-free and clustered on
This produces a lot of dead rows and requires (auto-)
If the
You could define your constraint as
Or you can drop the constraint and add it back when you are done:
Neither is possible while you have
The referenced columns must be the columns of a non-deferrable unique
or primary key constraint in the referenced table.
You would need to (lock all involved tables and) drop / recreate FK constraints and update all FK values manually (see option 3.). Or you have to move values out of the way with a second
Drawbacks as mentioned above.
One more option if you have plenty of RAM. This combines some of the advantages of the first two ways. Almost as fast as option 1. and you get a pristine, new table without bloat but keep all constraints and dependencies in place like in option 2.
However, per documentation:
from other tables, unless all such tables are also truncated in the
same command. Checking validity in such cases would require table
scans, and the whole point is not to do one.
Bold emphasis mine.
You could drop FK constraints temporarily and use data-modifying CTEs to update all FK columns:
Related, with more details:
VIEW with row_number(). Example in this related answer:- Gap-less sequence where multiple transactions with multiple tables are involved
Here are some recipes to remove gaps.
- New, pristine table
Avoids complications with unique violations and table bloat and is fast. Only for simple cases where you are not bound by FK references, views on the table or other depending objects, or by concurrent access. Do it in one transaction to avoid accidents:
BEGIN;
LOCK tbl; -- optionally: IN SHARE MODE to allow concurrent reads
CREATE TABLE tbl_new (LIKE tbl INCLUDING ALL);
INSERT INTO tbl_new -- no target list in this case
SELECT row_number() OVER (ORDER BY id), data -- all columns in default order
FROM tbl;
ALTER SEQUENCE tbl_id_seq OWNED BY tbl_new.id; -- make new table own sequence
DROP TABLE tbl;
ALTER TABLE tbl_new RENAME TO tbl;
SELECT setval('tbl_id_seq', max(id)) FROM tbl; -- reset sequence
COMMIT;CREATE TABLE tbl_new (LIKE tbl INCLUDING ALL) copies the structure incl. constraints and defaults from the original table. Then make the new table column own the sequence:- Activerecord-import & serial column in PostgreSQL
And reset it to the new maximum:
- How to reset Postgres' primary key sequence when it falls out of sync?
This carries the advantage that the new table is bloat-free and clustered on
id.UPDATEin place
This produces a lot of dead rows and requires (auto-)
VACUUM later.If the
serial column is also the PRIMARY KEY (like in your case) or has a UNIQUE constraint, you must avoid unique violations in the process. The (cheaper) default for PK / UNIQUE constraints is to be NOT DEFERRABLE, which forces a check after every single single row. All details under this related question on SO:- Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?
You could define your constraint as
DEFERRABLE (which makes it more expensive).Or you can drop the constraint and add it back when you are done:
BEGIN;
LOCK tbl;
ALTER TABLE tbl DROP CONSTRAINT tbl_pkey; -- remove PK
UPDATE tbl t -- intermediate unique violations are ignored now
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id;
SELECT setval('tbl_id_seq', max(id)) FROM tbl; -- reset sequence
ALTER TABLE tbl ADD CONSTRAINT tbl_pkey PRIMARY KEY(id); -- add PK back
COMMIT;Neither is possible while you have
FOREIGN KEY constraints referencing the column(s) because (per documentation):The referenced columns must be the columns of a non-deferrable unique
or primary key constraint in the referenced table.
You would need to (lock all involved tables and) drop / recreate FK constraints and update all FK values manually (see option 3.). Or you have to move values out of the way with a second
UPDATE to avoid conflicts. For instance, assuming you have no negative numbers:BEGIN;
LOCK tbl;
UPDATE tbl SET id = id * -1; -- avoid conflicts
UPDATE tbl t
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id DESC) AS new_id FROM tbl) t1
WHERE t.id = t1.id;
SELECT setval('tbl_id_seq', max(id)) FROM tbl; -- reset sequence
COMMIT;Drawbacks as mentioned above.
- Temp table,
TRUNCATE,INSERT
One more option if you have plenty of RAM. This combines some of the advantages of the first two ways. Almost as fast as option 1. and you get a pristine, new table without bloat but keep all constraints and dependencies in place like in option 2.
However, per documentation:
TRUNCATE cannot be used on a table that has foreign-key referencesfrom other tables, unless all such tables are also truncated in the
same command. Checking validity in such cases would require table
scans, and the whole point is not to do one.
Bold emphasis mine.
You could drop FK constraints temporarily and use data-modifying CTEs to update all FK columns:
SET temp_buffers = 500MB; -- example value, see 1st link below
BEGIN;
CREATE TEMP TABLE tbl_tmp AS
SELECT row_number() OVER (ORDER BY id) AS new_id, *
FROM tbl
ORDER BY id; -- order here to use index (if one exists)
-- drop FK constraints in other tables referencing this one
-- which takes out an exclusive lock on those tables
TRUNCATE tbl;
INSERT INTO tbl
SELECT new_id, data -- list all columns in order
FROM tbl_tmp; -- rely on established order in tbl_tmp
-- ORDER BY id; -- only to be absolutely sure (not necessary)
-- example for table "fk_tbl" with FK column "fk_id"
UPDATE fk_tbl f
SET fk_id = t.new_id -- set to new ID
FROM tbl_tmp t
WHERE f.fk_id = t.id; -- match on old ID
-- add FK constraints in other tables back
COMMIT;Related, with more details:
- How to delete duplicate entries?
- How to “merge
Code Snippets
BEGIN;
LOCK tbl; -- optionally: IN SHARE MODE to allow concurrent reads
CREATE TABLE tbl_new (LIKE tbl INCLUDING ALL);
INSERT INTO tbl_new -- no target list in this case
SELECT row_number() OVER (ORDER BY id), data -- all columns in default order
FROM tbl;
ALTER SEQUENCE tbl_id_seq OWNED BY tbl_new.id; -- make new table own sequence
DROP TABLE tbl;
ALTER TABLE tbl_new RENAME TO tbl;
SELECT setval('tbl_id_seq', max(id)) FROM tbl; -- reset sequence
COMMIT;BEGIN;
LOCK tbl;
ALTER TABLE tbl DROP CONSTRAINT tbl_pkey; -- remove PK
UPDATE tbl t -- intermediate unique violations are ignored now
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id;
SELECT setval('tbl_id_seq', max(id)) FROM tbl; -- reset sequence
ALTER TABLE tbl ADD CONSTRAINT tbl_pkey PRIMARY KEY(id); -- add PK back
COMMIT;BEGIN;
LOCK tbl;
UPDATE tbl SET id = id * -1; -- avoid conflicts
UPDATE tbl t
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id DESC) AS new_id FROM tbl) t1
WHERE t.id = t1.id;
SELECT setval('tbl_id_seq', max(id)) FROM tbl; -- reset sequence
COMMIT;SET temp_buffers = 500MB; -- example value, see 1st link below
BEGIN;
CREATE TEMP TABLE tbl_tmp AS
SELECT row_number() OVER (ORDER BY id) AS new_id, *
FROM tbl
ORDER BY id; -- order here to use index (if one exists)
-- drop FK constraints in other tables referencing this one
-- which takes out an exclusive lock on those tables
TRUNCATE tbl;
INSERT INTO tbl
SELECT new_id, data -- list all columns in order
FROM tbl_tmp; -- rely on established order in tbl_tmp
-- ORDER BY id; -- only to be absolutely sure (not necessary)
-- example for table "fk_tbl" with FK column "fk_id"
UPDATE fk_tbl f
SET fk_id = t.new_id -- set to new ID
FROM tbl_tmp t
WHERE f.fk_id = t.id; -- match on old ID
-- add FK constraints in other tables back
COMMIT;Context
StackExchange Database Administrators Q#111823, answer score: 11
Revisions (0)
No revisions yet.