HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

"Merge" two rows in a Postgres table, with foreign keys

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowspostgresmergewithforeignkeystwotable

Problem

I am keeping a database of books I've read, using the following two tables in PostgreSQL:

CREATE TABLE authors (
    id SERIAL PRIMARY KEY,
    name text
);

CREATE TABLE books (
    id SERIAL PRIMARY KEY,
    title text,
    author_id integer REFERENCES authors(id) ON UPDATE CASCADE ON DELETE CASCADE,
    UNIQUE(title, author_id)
);


Now when going through my list of authors, I found the following two entries:

id | name
----------
 1 | Mark Twain
 2 | Samuel Clemens


What I'd like to do is delete the "Mark Twain" entry, and effectively update all books referencing "Mark Twain" to reference "Samuel Clemens". I know I could do this manually, but I want a solution that works, regardless of which tables are referencing the authors(id)

I thought about doing it like this (within a transaction):

  • Change Mark Twain id to 2, letting UPDATE CASCADE take care of changing the references.



  • Delete Mark Twain entry



But this runs into a few problems, mainly:

  • The first step creates a duplicate primary key



  • I'm not sure how to reference the right row to delete, once they both have the same ID!



  • The DELETE CASCADE worries me for the second step



There's also a subtler problem, that can be illustrated with a portion of my (poorly curated) books table:

id | title              | author_id
------------------------------------
 1 | "Huckleberry Finn" | 1
 2 | "Huckleberry Finn" | 2


Here, even if my two-step process succeeded, I would be violating the UNIQUE contstraint on books.

Is there a way to do this, and work around most/all of these issues? Using Postgres 9.4.

Solution

Assuming you just want to delete duplicates in books after merging duplicate authors.

BEGIN;
LOCK books, authors;

CREATE TEMP TABLE dupes ON COMMIT DROP AS (SELECT 2 AS dupe, 1 AS org);

DELETE FROM books b  -- delete duplicate books
USING  dupes d
WHERE  b.author_id = d.dupe
AND    EXISTS (
   SELECT 1
   FROM   books
   WHERE  title = b.title
   AND    author_id = d.org
   );

UPDATE books b  -- now we relink all remaining books
SET    author_id = d.org
FROM   dupes d   
WHERE  b.author_id = d.dupe;

DELETE FROM authors a  -- now we can delete all dupes
USING  dupes d
WHERE  a.id = d.dupe;

COMMIT;


The temp table could hold many rows to remove many dupes at once.

Repeat the first two steps for every tables referencing authors.id. If there are many I would create and execute the statements dynamically ...

I lock the tables explicitly to avoid concurrent disturbances.
Automation

A basic function could look like this:

CREATE OR REPLACE FUNCTION f_remove_dupe(_tbl text, _col text, _dupe int, _org int)
  RETURNS void
  LANGUAGE plpgsql AS
$func$
DECLARE
   _ftbl text;
   _fcol text;
BEGIN
   FOR _ftbl, _fcol IN
      -- table and column name behind all referencing FKs
      SELECT c.conrelid::regclass::text, f.attname
      FROM   pg_attribute  a 
      JOIN   pg_constraint c ON a.attrelid = c.confrelid AND a.attnum = c.confkey[1]
      JOIN   pg_attribute  f ON f.attrelid = c.conrelid AND f.attnum = c.conkey[1]
      WHERE  a.attrelid = _tbl::regclass
      AND    a.attname  = _col
      AND    c.contype  = 'f'
   LOOP
      EXIT WHEN _ftbl IS NULL;  -- skip if not found
      EXECUTE format('
         UPDATE %1$s
         SET    %2$I = $2
         WHERE  %2$I = $1'
       , _ftbl, _fcol)
      USING _dupe, _org;
   END LOOP;

   EXECUTE format('
      DELETE FROM %I WHERE %I = $1'
    , _tbl, _col)
   USING _dupe;
END
$func$;


Call:

SELECT f_remove_dupe('authors', 'id', 2, 1);


This simple version ...

  • ... only works for a single dupe.



  • ... ignores UNIQUE constraints in referencing tables.



  • ... assumes all FK constraints only use the one column, ignoring multi-column FKs



  • ... ignores possible interference from concurrent transactions.



Adapt to your requirements.

Related:

  • Find referenced field(s) of foreign key constraint



  • Can't execute dynamic DDL, argument is NULL



  • SQL injection in Postgres functions vs prepared queries

Code Snippets

BEGIN;
LOCK books, authors;

CREATE TEMP TABLE dupes ON COMMIT DROP AS (SELECT 2 AS dupe, 1 AS org);

DELETE FROM books b  -- delete duplicate books
USING  dupes d
WHERE  b.author_id = d.dupe
AND    EXISTS (
   SELECT 1
   FROM   books
   WHERE  title = b.title
   AND    author_id = d.org
   );

UPDATE books b  -- now we relink all remaining books
SET    author_id = d.org
FROM   dupes d   
WHERE  b.author_id = d.dupe;

DELETE FROM authors a  -- now we can delete all dupes
USING  dupes d
WHERE  a.id = d.dupe;

COMMIT;
CREATE OR REPLACE FUNCTION f_remove_dupe(_tbl text, _col text, _dupe int, _org int)
  RETURNS void
  LANGUAGE plpgsql AS
$func$
DECLARE
   _ftbl text;
   _fcol text;
BEGIN
   FOR _ftbl, _fcol IN
      -- table and column name behind all referencing FKs
      SELECT c.conrelid::regclass::text, f.attname
      FROM   pg_attribute  a 
      JOIN   pg_constraint c ON a.attrelid = c.confrelid AND a.attnum = c.confkey[1]
      JOIN   pg_attribute  f ON f.attrelid = c.conrelid AND f.attnum = c.conkey[1]
      WHERE  a.attrelid = _tbl::regclass
      AND    a.attname  = _col
      AND    c.contype  = 'f'
   LOOP
      EXIT WHEN _ftbl IS NULL;  -- skip if not found
      EXECUTE format('
         UPDATE %1$s
         SET    %2$I = $2
         WHERE  %2$I = $1'
       , _ftbl, _fcol)
      USING _dupe, _org;
   END LOOP;

   EXECUTE format('
      DELETE FROM %I WHERE %I = $1'
    , _tbl, _col)
   USING _dupe;
END
$func$;
SELECT f_remove_dupe('authors', 'id', 2, 1);

Context

StackExchange Database Administrators Q#122306, answer score: 8

Revisions (0)

No revisions yet.