patternsqlMinor
Delete rows while updating references
Viewed 0 times
rowswhiledeleteupdatingreferences
Problem
Suppose I have a
Now, what I am trying to do is: Remove duplicate entries from the person table, fixing the references
Person
Since there is an
What I am trying to achieve here is avoiding a dedicated procedure to update the references since the number of referring tables will most likely increase, requiring maintenance in this procedure (and it is quite big already).
person table with an id and some other columns. Other tables reference the person table using ON UPDATE CASCADE. for instance:CREATE TABLE person (id int PRIMARY KEY);
CREATE TABLE sale (id int PRIMARY KEY,
person_id int REFERENCES person(id) ON UPDATE CASCADE);Now, what I am trying to do is: Remove duplicate entries from the person table, fixing the references
sale references. For instance, using this data:INSERT INTO person values (1), (2), (3);
INSERT INTO sale VALUES (11, 1), (12, 2), (13, 3);Person
1 and 2 are actually the same, so they should be 'merged'. By merging I mean excluding person 2 while updating the references for person 1 in the sale table.Since there is an
ON UPDATE CASCADE on the table that references person, I wonder if there is some way to, without having to manually update the sale table, cause the cascade update while deleting the person 2What I am trying to achieve here is avoiding a dedicated procedure to update the references since the number of referring tables will most likely increase, requiring maintenance in this procedure (and it is quite big already).
Solution
No, there is no built-in feature for merging one key into another, updating all referencing rows automatically. There could be all kinds of complications you need to resolve.
Simple case
You need to run
But there may be all kinds of ...
Complications
Imagine a table
There may be more sophisticated constellations ...
Full automation
If you have lots of referencing tables that can be treated in the same fashion (or a changing number), you can automate the procedure with dynamic SQL.
For the simple case with a single-column fk constraint and no complications:
Call:
This updates any number of tables that have a foreign key referencing the master table.
Using named parameters in the call, which is not required, but you'll want to make sure not to confuse original and duplicate here.
SQL Fiddle.
Major points
-
If you are unfamiliar with the concept of dynamic SQL in plpgsql functions, refer to some of the related questions:
-
Quote column names correctly to avoid SQL injection.
-
If you have to deal with heavy concurrent load, you must be prepared that the final
Simple case
You need to run
UPDATE on all referencing tables and DELETE on the referenced table. The simple case would look like this:BEGIN;
-- 1. update references
UPDATE sale
SET person_id = _org_id
WHERE person_id = _dupe_id;
-- 2. kill dupe
DELETE FROM person
WHERE id = _dupe_id;
COMMIT;But there may be all kinds of ...
Complications
Imagine a table
person_tag implementing an n:m relationship between person and tag with a unique constraint on (person_id, tag_id). You can't UPDATE rows that would result in duplicate tags. You need to resolve conflicts: only update rows that do not conflict with the unique constraint and delete the rest.-- example for conflict resolution
UPDATE person_tag pt
SET person_id = _org_id
WHERE person_id = _dupe_id
AND NOT EXISTS (
SELECT 1 FROM person_tag
WHERE person_id = _org_id
AND tag_id = pt.tag_id
);
DELETE FROM person_tag
WHERE person_id = _dupe_id;
-- end person_tagThere may be more sophisticated constellations ...
Full automation
If you have lots of referencing tables that can be treated in the same fashion (or a changing number), you can automate the procedure with dynamic SQL.
For the simple case with a single-column fk constraint and no complications:
CREATE OR REPLACE FUNCTION f_merge_id(_tbl regclass, _col text, _org_id int, _dupe_id int)
RETURNS void AS
$func$
DECLARE
rec record;
BEGIN
FOR rec IN -- find all referencing columns
SELECT c.conrelid::regclass AS tbl, quote_ident(a2.attname) AS col
FROM pg_catalog.pg_attribute a1
JOIN pg_catalog.pg_constraint c ON c.confrelid = a1.attrelid
AND c.confkey = ARRAY[a1.attnum]
JOIN pg_catalog.pg_attribute a2 ON a2.attrelid = c.conrelid
AND a2.attnum = c.conkey[1]
WHERE a1.attrelid = _tbl
AND a1.attname = _col
AND c.contype = 'f' -- fk constraint
LOOP -- Redirect to _org_id all references to dupe_id
EXECUTE format('
UPDATE %1$s
SET %2$s = $1
WHERE %2$s = $2'
,rec.tbl, rec.col)
USING _org_id, _dupe_id;
END LOOP;
-- Finally kill (now orphaned) dupe
EXECUTE format('DELETE FROM %s WHERE %s = $1', _tbl, _col)
USING _dupe_id;
END
$func$ LANGUAGE plpgsql;Call:
SELECT f_merge_id('person', 'person_id', _org_id := 1, _dupe_id := 2);This updates any number of tables that have a foreign key referencing the master table.
Using named parameters in the call, which is not required, but you'll want to make sure not to confuse original and duplicate here.
SQL Fiddle.
Major points
-
If you are unfamiliar with the concept of dynamic SQL in plpgsql functions, refer to some of the related questions:
- https://dba.stackexchange.com/questions/tagged/postgresql+dynamic-sql+plpgsql
-
Quote column names correctly to avoid SQL injection.
quote_ident() does that for you:- SQL injection in Postgres functions vs prepared queries
-
If you have to deal with heavy concurrent load, you must be prepared that the final
DELETE can fail, which would roll back the whole transaction. Concurrent transactions could enter new rows with _dupe_id. There are various ways to deal with that, which goes beyond the scope of this question.Code Snippets
BEGIN;
-- 1. update references
UPDATE sale
SET person_id = _org_id
WHERE person_id = _dupe_id;
-- 2. kill dupe
DELETE FROM person
WHERE id = _dupe_id;
COMMIT;-- example for conflict resolution
UPDATE person_tag pt
SET person_id = _org_id
WHERE person_id = _dupe_id
AND NOT EXISTS (
SELECT 1 FROM person_tag
WHERE person_id = _org_id
AND tag_id = pt.tag_id
);
DELETE FROM person_tag
WHERE person_id = _dupe_id;
-- end person_tagCREATE OR REPLACE FUNCTION f_merge_id(_tbl regclass, _col text, _org_id int, _dupe_id int)
RETURNS void AS
$func$
DECLARE
rec record;
BEGIN
FOR rec IN -- find all referencing columns
SELECT c.conrelid::regclass AS tbl, quote_ident(a2.attname) AS col
FROM pg_catalog.pg_attribute a1
JOIN pg_catalog.pg_constraint c ON c.confrelid = a1.attrelid
AND c.confkey = ARRAY[a1.attnum]
JOIN pg_catalog.pg_attribute a2 ON a2.attrelid = c.conrelid
AND a2.attnum = c.conkey[1]
WHERE a1.attrelid = _tbl
AND a1.attname = _col
AND c.contype = 'f' -- fk constraint
LOOP -- Redirect to _org_id all references to dupe_id
EXECUTE format('
UPDATE %1$s
SET %2$s = $1
WHERE %2$s = $2'
,rec.tbl, rec.col)
USING _org_id, _dupe_id;
END LOOP;
-- Finally kill (now orphaned) dupe
EXECUTE format('DELETE FROM %s WHERE %s = $1', _tbl, _col)
USING _dupe_id;
END
$func$ LANGUAGE plpgsql;SELECT f_merge_id('person', 'person_id', _org_id := 1, _dupe_id := 2);Context
StackExchange Database Administrators Q#76035, answer score: 5
Revisions (0)
No revisions yet.