snippetsqlMinor
How to atomically replace a subset of table data
Viewed 0 times
howreplaceatomicallysubsetdatatable
Problem
In PostgreSQL 9.6 I have a table
There is a view
So I will select from
Or
And replace the relevant rows in
`
More details after first answer
-
Transactions can run concurrently. There can definitely be a case when
T like thiscategory | id | data
---------+----+------
A | 1 | foo
A | 2 | bar
A | 3 | baz
B | 4 | eh
B | 5 | whatcomesafterfoobarbazThere is a view
V giving me the data for T, so it has columns category, id, data. T is essentially the materialized view for V, except that I need to refresh it with more granularity than "refresh everything".So I will select from
V for exampleSELECT * FROM V WHERE category = 'A';Or
SELECT * FROM V WHERE category = 'A' AND id = 2;And replace the relevant rows in
T with whatever data V gives me. Unfortunately I cannot do a simple UPDATE: asking V eg. for WHERE category = 'A' might give me a totally different set of rows than before. Therefore I need to do this sequence:DELETE FROM T WHERE ;
INSERT INTO T (SELECT FROM V WHERE );`
is either WHERE category = ? or WHERE category = ? AND id = ?.
How do I do this so that the following conditions hold?
- Reads from rows not satisfying
should be unaffected.
- The change should be atomic, meaning reads from rows satisfying
should either see the the old row set or the new row set, not a mix.
Note: unlike this question, I don't want to replace the whole table at once - only the rows affected.
Added details
-
There are more reads than writes, on the order of 10-100 times more. After each write there will be a read to the adjacent categories. The application is looking at a set of categories, ids and data and updates the data for one or more categories at a time. Right after it will re-fetch those categories and display them, and it must see the fresh data. All the ids are always fetched with "their" category.
-
Each category will have something like 1-10 ids, there will be tens of thousands of categories`.More details after first answer
-
Transactions can run concurrently. There can definitely be a case when
Solution
Concurrent reads are not a problem. Writers don't block readers and vice versa in the default
If there can be multiple transactions trying to write at the same time, that's a game changer. A single transaction protects you from inconsistent updates, but it cannot protect you from race conditions between concurrent transactions: deadlocks.
Say, we have two transactions T1 and T2, and category 'A' has 10 IDs:
Postgres detects the deadlock after some time and kills one of the two transactions. (A deadlock error is reported.)
You could switch to
Or you can avoid the problem by always deleting rows in identical, deterministic order. Like:
But typically, there is a more convenient option. If you have a separate table holding unique categories named, say,
and then (in the same transaction) write to category 'A' rows in
In Postgres 9.4 or later consider
Concerning:
After each write there will be a read to the adjacent categories.
You are aware of the
READ COMMITTED isolation level. Enclose DELETE and INSERT in a single transaction to make the operation atomic (all applied or nothing).If there can be multiple transactions trying to write at the same time, that's a game changer. A single transaction protects you from inconsistent updates, but it cannot protect you from race conditions between concurrent transactions: deadlocks.
Say, we have two transactions T1 and T2, and category 'A' has 10 IDs:
T1: DELETE FROM T WHERE category = 'A';
-- starts taking row locks in arbitrary order: id 1,2,3,4,5,6,7 ...
T2: DELETE FROM T WHERE category = 'A';
-- starts taking row locks in arbitrary order: id 10, 9, 8, ...
T1: wait for T2 to release lock on id 8
T2: wait for T1 to release lock on id 7
DEADLOCK.Postgres detects the deadlock after some time and kills one of the two transactions. (A deadlock error is reported.)
You could switch to
SERIALIZABLE transaction isolation. But that's much more expensive and you need to prepare for serialization failures and retry in this case.Or you can avoid the problem by always deleting rows in identical, deterministic order. Like:
WITH del AS (
SELECT category, id
FROM T
WHERE category = 'A'
ORDER BY category, id -- enforce this order in *all* writing queries
FOR UPDATE
)
DELETE FROM T
USING del
WHERE T.category = del.category
AND T.id = del.id;But typically, there is a more convenient option. If you have a separate table holding unique categories named, say,
cat, you can lock the single parent row in cat with:SELECT * FROM cat WHERE category = 'A' FOR UPDATE;and then (in the same transaction) write to category 'A' rows in
T at will (still encapsulated in a single transaction to avoid intermediary, inconsistent states being visible). Of course, all writing queries must follow the same protocol. Then, concurrent transactions will wait for the lock on cat before they write to T and everything is groovy ...In Postgres 9.4 or later consider
FOR NO KEY UPDATE instead:- How to perform conditional insert based on row count?
Concerning:
After each write there will be a read to the adjacent categories.
You are aware of the
RETURNING clause, right? No need for a separate read, if you just inserted all rows for a given category. Example:- PostgreSQL multi INSERT…RETURNING with multiple columns
Code Snippets
T1: DELETE FROM T WHERE category = 'A';
-- starts taking row locks in arbitrary order: id 1,2,3,4,5,6,7 ...
T2: DELETE FROM T WHERE category = 'A';
-- starts taking row locks in arbitrary order: id 10, 9, 8, ...
T1: wait for T2 to release lock on id 8
T2: wait for T1 to release lock on id 7
DEADLOCK.WITH del AS (
SELECT category, id
FROM T
WHERE category = 'A'
ORDER BY category, id -- enforce this order in *all* writing queries
FOR UPDATE
)
DELETE FROM T
USING del
WHERE T.category = del.category
AND T.id = del.id;SELECT * FROM cat WHERE category = 'A' FOR UPDATE;Context
StackExchange Database Administrators Q#166816, answer score: 7
Revisions (0)
No revisions yet.