patternsqlMinor

Delete rows while updating references

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

rowswhiledeleteupdatingreferences

Problem

Suppose I have a person table with an id and some other columns. Other tables reference the person table using ON UPDATE CASCADE. for instance:

CREATE TABLE person (id int PRIMARY KEY);
CREATE TABLE sale (id int PRIMARY KEY,
    person_id int REFERENCES person(id) ON UPDATE CASCADE);

Now, what I am trying to do is: Remove duplicate entries from the person table, fixing the references sale references. For instance, using this data:

INSERT INTO person values (1), (2), (3);
INSERT INTO sale VALUES (11, 1), (12, 2), (13, 3);

Person 1 and 2 are actually the same, so they should be 'merged'. By merging I mean excluding person 2 while updating the references for person 1 in the sale table.

Since there is an ON UPDATE CASCADE on the table that references person, I wonder if there is some way to, without having to manually update the sale table, cause the cascade update while deleting the person 2

What I am trying to achieve here is avoiding a dedicated procedure to update the references since the number of referring tables will most likely increase, requiring maintenance in this procedure (and it is quite big already).

Solution

No, there is no built-in feature for merging one key into another, updating all referencing rows automatically. There could be all kinds of complications you need to resolve.

Simple case

You need to run UPDATE on all referencing tables and DELETE on the referenced table. The simple case would look like this:

BEGIN;

--  1. update references
UPDATE sale
SET    person_id = _org_id
WHERE  person_id = _dupe_id;

-- 2. kill dupe 
DELETE FROM person
WHERE  id = _dupe_id;

COMMIT;

But there may be all kinds of ...

Complications

Imagine a table person_tag implementing an n:m relationship between person and tag with a unique constraint on (person_id, tag_id). You can't UPDATE rows that would result in duplicate tags. You need to resolve conflicts: only update rows that do not conflict with the unique constraint and delete the rest.

-- example for conflict resolution
UPDATE person_tag pt
SET    person_id = _org_id
WHERE  person_id = _dupe_id
AND NOT EXISTS (
   SELECT 1 FROM person_tag
   WHERE  person_id = _org_id
   AND    tag_id = pt.tag_id
   );

DELETE FROM person_tag
WHERE  person_id = _dupe_id;
-- end person_tag

There may be more sophisticated constellations ...

Full automation

If you have lots of referencing tables that can be treated in the same fashion (or a changing number), you can automate the procedure with dynamic SQL.

For the simple case with a single-column fk constraint and no complications:

CREATE OR REPLACE FUNCTION f_merge_id(_tbl regclass, _col text, _org_id int, _dupe_id int)
  RETURNS void AS
$func$
DECLARE
   rec record;
BEGIN
   FOR rec IN  -- find all referencing columns
      SELECT c.conrelid::regclass AS tbl, quote_ident(a2.attname) AS col
      FROM   pg_catalog.pg_attribute  a1
      JOIN   pg_catalog.pg_constraint c  ON c.confrelid = a1.attrelid
                                        AND c.confkey   = ARRAY[a1.attnum]
      JOIN   pg_catalog.pg_attribute  a2 ON a2.attrelid = c.conrelid
                                        AND a2.attnum   = c.conkey[1]
      WHERE  a1.attrelid = _tbl
      AND    a1.attname  = _col
      AND    c.contype   = 'f'  -- fk constraint
   LOOP        -- Redirect to _org_id all references to dupe_id
      EXECUTE format('
         UPDATE %1$s
         SET    %2$s = $1
         WHERE  %2$s = $2'
        ,rec.tbl, rec.col)
      USING _org_id, _dupe_id;
   END LOOP;

   -- Finally kill (now orphaned) dupe 
   EXECUTE format('DELETE FROM %s WHERE %s = $1', _tbl, _col)
   USING _dupe_id;
END
$func$ LANGUAGE plpgsql;

Call:

SELECT f_merge_id('person', 'person_id', _org_id := 1, _dupe_id := 2);

This updates any number of tables that have a foreign key referencing the master table.

Using named parameters in the call, which is not required, but you'll want to make sure not to confuse original and duplicate here.

SQL Fiddle.

Major points

-
If you are unfamiliar with the concept of dynamic SQL in plpgsql functions, refer to some of the related questions:

https://dba.stackexchange.com/questions/tagged/postgresql+dynamic-sql+plpgsql

-
Quote column names correctly to avoid SQL injection. quote_ident() does that for you:

SQL injection in Postgres functions vs prepared queries

-
If you have to deal with heavy concurrent load, you must be prepared that the final DELETE can fail, which would roll back the whole transaction. Concurrent transactions could enter new rows with _dupe_id. There are various ways to deal with that, which goes beyond the scope of this question.

Code Snippets

BEGIN;

--  1. update references
UPDATE sale
SET    person_id = _org_id
WHERE  person_id = _dupe_id;

-- 2. kill dupe 
DELETE FROM person
WHERE  id = _dupe_id;

COMMIT;

-- example for conflict resolution
UPDATE person_tag pt
SET    person_id = _org_id
WHERE  person_id = _dupe_id
AND NOT EXISTS (
   SELECT 1 FROM person_tag
   WHERE  person_id = _org_id
   AND    tag_id = pt.tag_id
   );

DELETE FROM person_tag
WHERE  person_id = _dupe_id;
-- end person_tag

CREATE OR REPLACE FUNCTION f_merge_id(_tbl regclass, _col text, _org_id int, _dupe_id int)
  RETURNS void AS
$func$
DECLARE
   rec record;
BEGIN
   FOR rec IN  -- find all referencing columns
      SELECT c.conrelid::regclass AS tbl, quote_ident(a2.attname) AS col
      FROM   pg_catalog.pg_attribute  a1
      JOIN   pg_catalog.pg_constraint c  ON c.confrelid = a1.attrelid
                                        AND c.confkey   = ARRAY[a1.attnum]
      JOIN   pg_catalog.pg_attribute  a2 ON a2.attrelid = c.conrelid
                                        AND a2.attnum   = c.conkey[1]
      WHERE  a1.attrelid = _tbl
      AND    a1.attname  = _col
      AND    c.contype   = 'f'  -- fk constraint
   LOOP        -- Redirect to _org_id all references to dupe_id
      EXECUTE format('
         UPDATE %1$s
         SET    %2$s = $1
         WHERE  %2$s = $2'
        ,rec.tbl, rec.col)
      USING _org_id, _dupe_id;
   END LOOP;

   -- Finally kill (now orphaned) dupe 
   EXECUTE format('DELETE FROM %s WHERE %s = $1', _tbl, _col)
   USING _dupe_id;
END
$func$ LANGUAGE plpgsql;

SELECT f_merge_id('person', 'person_id', _org_id := 1, _dupe_id := 2);

Context

StackExchange Database Administrators Q#76035, answer score: 5

Revisions (0)

No revisions yet.