patternsqlMinor

"Merge" two rows in a Postgres table, with foreign keys

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

rowspostgresmergewithforeignkeystwotable

Problem

I am keeping a database of books I've read, using the following two tables in PostgreSQL:

CREATE TABLE authors (
    id SERIAL PRIMARY KEY,
    name text
);

CREATE TABLE books (
    id SERIAL PRIMARY KEY,
    title text,
    author_id integer REFERENCES authors(id) ON UPDATE CASCADE ON DELETE CASCADE,
    UNIQUE(title, author_id)
);

Now when going through my list of authors, I found the following two entries:

id | name
----------
 1 | Mark Twain
 2 | Samuel Clemens

What I'd like to do is delete the "Mark Twain" entry, and effectively update all books referencing "Mark Twain" to reference "Samuel Clemens". I know I could do this manually, but I want a solution that works, regardless of which tables are referencing the authors(id)

I thought about doing it like this (within a transaction):

Change Mark Twain id to 2, letting UPDATE CASCADE take care of changing the references.

Delete Mark Twain entry

But this runs into a few problems, mainly:

The first step creates a duplicate primary key

I'm not sure how to reference the right row to delete, once they both have the same ID!

The DELETE CASCADE worries me for the second step

There's also a subtler problem, that can be illustrated with a portion of my (poorly curated) books table:

id | title              | author_id
------------------------------------
 1 | "Huckleberry Finn" | 1
 2 | "Huckleberry Finn" | 2

Here, even if my two-step process succeeded, I would be violating the UNIQUE contstraint on books.

Is there a way to do this, and work around most/all of these issues? Using Postgres 9.4.

Solution

Assuming you just want to delete duplicates in books after merging duplicate authors.

BEGIN;
LOCK books, authors;

CREATE TEMP TABLE dupes ON COMMIT DROP AS (SELECT 2 AS dupe, 1 AS org);

DELETE FROM books b  -- delete duplicate books
USING  dupes d
WHERE  b.author_id = d.dupe
AND    EXISTS (
   SELECT 1
   FROM   books
   WHERE  title = b.title
   AND    author_id = d.org
   );

UPDATE books b  -- now we relink all remaining books
SET    author_id = d.org
FROM   dupes d   
WHERE  b.author_id = d.dupe;

DELETE FROM authors a  -- now we can delete all dupes
USING  dupes d
WHERE  a.id = d.dupe;

COMMIT;

The temp table could hold many rows to remove many dupes at once.

Repeat the first two steps for every tables referencing authors.id. If there are many I would create and execute the statements dynamically ...

I lock the tables explicitly to avoid concurrent disturbances.
Automation

A basic function could look like this:

CREATE OR REPLACE FUNCTION f_remove_dupe(_tbl text, _col text, _dupe int, _org int)
  RETURNS void
  LANGUAGE plpgsql AS
$func$
DECLARE
   _ftbl text;
   _fcol text;
BEGIN
   FOR _ftbl, _fcol IN
      -- table and column name behind all referencing FKs
      SELECT c.conrelid::regclass::text, f.attname
      FROM   pg_attribute  a 
      JOIN   pg_constraint c ON a.attrelid = c.confrelid AND a.attnum = c.confkey[1]
      JOIN   pg_attribute  f ON f.attrelid = c.conrelid AND f.attnum = c.conkey[1]
      WHERE  a.attrelid = _tbl::regclass
      AND    a.attname  = _col
      AND    c.contype  = 'f'
   LOOP
      EXIT WHEN _ftbl IS NULL;  -- skip if not found
      EXECUTE format('
         UPDATE %1$s
         SET    %2$I = $2
         WHERE  %2$I = $1'
       , _ftbl, _fcol)
      USING _dupe, _org;
   END LOOP;

   EXECUTE format('
      DELETE FROM %I WHERE %I = $1'
    , _tbl, _col)
   USING _dupe;
END
$func$;

Call:

SELECT f_remove_dupe('authors', 'id', 2, 1);

This simple version ...

... only works for a single dupe.

... ignores UNIQUE constraints in referencing tables.

... assumes all FK constraints only use the one column, ignoring multi-column FKs

... ignores possible interference from concurrent transactions.

Adapt to your requirements.

Related:

Find referenced field(s) of foreign key constraint

Can't execute dynamic DDL, argument is NULL

SQL injection in Postgres functions vs prepared queries

Code Snippets

BEGIN;
LOCK books, authors;

CREATE TEMP TABLE dupes ON COMMIT DROP AS (SELECT 2 AS dupe, 1 AS org);

DELETE FROM books b  -- delete duplicate books
USING  dupes d
WHERE  b.author_id = d.dupe
AND    EXISTS (
   SELECT 1
   FROM   books
   WHERE  title = b.title
   AND    author_id = d.org
   );

UPDATE books b  -- now we relink all remaining books
SET    author_id = d.org
FROM   dupes d   
WHERE  b.author_id = d.dupe;

DELETE FROM authors a  -- now we can delete all dupes
USING  dupes d
WHERE  a.id = d.dupe;

COMMIT;

CREATE OR REPLACE FUNCTION f_remove_dupe(_tbl text, _col text, _dupe int, _org int)
  RETURNS void
  LANGUAGE plpgsql AS
$func$
DECLARE
   _ftbl text;
   _fcol text;
BEGIN
   FOR _ftbl, _fcol IN
      -- table and column name behind all referencing FKs
      SELECT c.conrelid::regclass::text, f.attname
      FROM   pg_attribute  a 
      JOIN   pg_constraint c ON a.attrelid = c.confrelid AND a.attnum = c.confkey[1]
      JOIN   pg_attribute  f ON f.attrelid = c.conrelid AND f.attnum = c.conkey[1]
      WHERE  a.attrelid = _tbl::regclass
      AND    a.attname  = _col
      AND    c.contype  = 'f'
   LOOP
      EXIT WHEN _ftbl IS NULL;  -- skip if not found
      EXECUTE format('
         UPDATE %1$s
         SET    %2$I = $2
         WHERE  %2$I = $1'
       , _ftbl, _fcol)
      USING _dupe, _org;
   END LOOP;

   EXECUTE format('
      DELETE FROM %I WHERE %I = $1'
    , _tbl, _col)
   USING _dupe;
END
$func$;

SELECT f_remove_dupe('authors', 'id', 2, 1);

Context

StackExchange Database Administrators Q#122306, answer score: 8

Revisions (0)

No revisions yet.