patternsqlMinor
"Merge" two rows in a Postgres table, with foreign keys
Viewed 0 times
rowspostgresmergewithforeignkeystwotable
Problem
I am keeping a database of books I've read, using the following two tables in PostgreSQL:
Now when going through my list of authors, I found the following two entries:
What I'd like to do is delete the "Mark Twain" entry, and effectively update all books referencing "Mark Twain" to reference "Samuel Clemens". I know I could do this manually, but I want a solution that works, regardless of which tables are referencing the
I thought about doing it like this (within a transaction):
But this runs into a few problems, mainly:
There's also a subtler problem, that can be illustrated with a portion of my (poorly curated)
Here, even if my two-step process succeeded, I would be violating the
Is there a way to do this, and work around most/all of these issues? Using Postgres 9.4.
CREATE TABLE authors (
id SERIAL PRIMARY KEY,
name text
);
CREATE TABLE books (
id SERIAL PRIMARY KEY,
title text,
author_id integer REFERENCES authors(id) ON UPDATE CASCADE ON DELETE CASCADE,
UNIQUE(title, author_id)
);Now when going through my list of authors, I found the following two entries:
id | name
----------
1 | Mark Twain
2 | Samuel ClemensWhat I'd like to do is delete the "Mark Twain" entry, and effectively update all books referencing "Mark Twain" to reference "Samuel Clemens". I know I could do this manually, but I want a solution that works, regardless of which tables are referencing the
authors(id)I thought about doing it like this (within a transaction):
- Change Mark Twain
idto 2, lettingUPDATE CASCADEtake care of changing the references.
- Delete Mark Twain entry
But this runs into a few problems, mainly:
- The first step creates a duplicate primary key
- I'm not sure how to reference the right row to delete, once they both have the same ID!
- The
DELETE CASCADEworries me for the second step
There's also a subtler problem, that can be illustrated with a portion of my (poorly curated)
books table:id | title | author_id
------------------------------------
1 | "Huckleberry Finn" | 1
2 | "Huckleberry Finn" | 2Here, even if my two-step process succeeded, I would be violating the
UNIQUE contstraint on books.Is there a way to do this, and work around most/all of these issues? Using Postgres 9.4.
Solution
Assuming you just want to delete duplicates in
The temp table could hold many rows to remove many dupes at once.
Repeat the first two steps for every tables referencing
I lock the tables explicitly to avoid concurrent disturbances.
Automation
A basic function could look like this:
Call:
This simple version ...
Adapt to your requirements.
Related:
books after merging duplicate authors.BEGIN;
LOCK books, authors;
CREATE TEMP TABLE dupes ON COMMIT DROP AS (SELECT 2 AS dupe, 1 AS org);
DELETE FROM books b -- delete duplicate books
USING dupes d
WHERE b.author_id = d.dupe
AND EXISTS (
SELECT 1
FROM books
WHERE title = b.title
AND author_id = d.org
);
UPDATE books b -- now we relink all remaining books
SET author_id = d.org
FROM dupes d
WHERE b.author_id = d.dupe;
DELETE FROM authors a -- now we can delete all dupes
USING dupes d
WHERE a.id = d.dupe;
COMMIT;The temp table could hold many rows to remove many dupes at once.
Repeat the first two steps for every tables referencing
authors.id. If there are many I would create and execute the statements dynamically ...I lock the tables explicitly to avoid concurrent disturbances.
Automation
A basic function could look like this:
CREATE OR REPLACE FUNCTION f_remove_dupe(_tbl text, _col text, _dupe int, _org int)
RETURNS void
LANGUAGE plpgsql AS
$func$
DECLARE
_ftbl text;
_fcol text;
BEGIN
FOR _ftbl, _fcol IN
-- table and column name behind all referencing FKs
SELECT c.conrelid::regclass::text, f.attname
FROM pg_attribute a
JOIN pg_constraint c ON a.attrelid = c.confrelid AND a.attnum = c.confkey[1]
JOIN pg_attribute f ON f.attrelid = c.conrelid AND f.attnum = c.conkey[1]
WHERE a.attrelid = _tbl::regclass
AND a.attname = _col
AND c.contype = 'f'
LOOP
EXIT WHEN _ftbl IS NULL; -- skip if not found
EXECUTE format('
UPDATE %1$s
SET %2$I = $2
WHERE %2$I = $1'
, _ftbl, _fcol)
USING _dupe, _org;
END LOOP;
EXECUTE format('
DELETE FROM %I WHERE %I = $1'
, _tbl, _col)
USING _dupe;
END
$func$;Call:
SELECT f_remove_dupe('authors', 'id', 2, 1);This simple version ...
- ... only works for a single dupe.
- ... ignores
UNIQUEconstraints in referencing tables.
- ... assumes all FK constraints only use the one column, ignoring multi-column FKs
- ... ignores possible interference from concurrent transactions.
Adapt to your requirements.
Related:
- Find referenced field(s) of foreign key constraint
- Can't execute dynamic DDL, argument is NULL
- SQL injection in Postgres functions vs prepared queries
Code Snippets
BEGIN;
LOCK books, authors;
CREATE TEMP TABLE dupes ON COMMIT DROP AS (SELECT 2 AS dupe, 1 AS org);
DELETE FROM books b -- delete duplicate books
USING dupes d
WHERE b.author_id = d.dupe
AND EXISTS (
SELECT 1
FROM books
WHERE title = b.title
AND author_id = d.org
);
UPDATE books b -- now we relink all remaining books
SET author_id = d.org
FROM dupes d
WHERE b.author_id = d.dupe;
DELETE FROM authors a -- now we can delete all dupes
USING dupes d
WHERE a.id = d.dupe;
COMMIT;CREATE OR REPLACE FUNCTION f_remove_dupe(_tbl text, _col text, _dupe int, _org int)
RETURNS void
LANGUAGE plpgsql AS
$func$
DECLARE
_ftbl text;
_fcol text;
BEGIN
FOR _ftbl, _fcol IN
-- table and column name behind all referencing FKs
SELECT c.conrelid::regclass::text, f.attname
FROM pg_attribute a
JOIN pg_constraint c ON a.attrelid = c.confrelid AND a.attnum = c.confkey[1]
JOIN pg_attribute f ON f.attrelid = c.conrelid AND f.attnum = c.conkey[1]
WHERE a.attrelid = _tbl::regclass
AND a.attname = _col
AND c.contype = 'f'
LOOP
EXIT WHEN _ftbl IS NULL; -- skip if not found
EXECUTE format('
UPDATE %1$s
SET %2$I = $2
WHERE %2$I = $1'
, _ftbl, _fcol)
USING _dupe, _org;
END LOOP;
EXECUTE format('
DELETE FROM %I WHERE %I = $1'
, _tbl, _col)
USING _dupe;
END
$func$;SELECT f_remove_dupe('authors', 'id', 2, 1);Context
StackExchange Database Administrators Q#122306, answer score: 8
Revisions (0)
No revisions yet.