patternsqlModerate
Optimizing concurrent updates in Postgres
Viewed 0 times
concurrentoptimizingupdatespostgres
Problem
I'm running concurrent Postgres queries like this:
Each query affects the fixed K number of rows, and I can't find a way to enforce the order in which the rows are updated, I end up with deadlocks. Currently I fix the problem by enforcing the order by hand, but this means I have to execute many more queries than I normally would while also raising the search complexity from O(log N + K) to O(K log N).
Is there a way to improve performance without ending up vulnerable to deadlocks? I suspect that replacing the
UPDATE foo SET bar = bar + 1 WHERE baz = 1234Each query affects the fixed K number of rows, and I can't find a way to enforce the order in which the rows are updated, I end up with deadlocks. Currently I fix the problem by enforcing the order by hand, but this means I have to execute many more queries than I normally would while also raising the search complexity from O(log N + K) to O(K log N).
Is there a way to improve performance without ending up vulnerable to deadlocks? I suspect that replacing the
(baz) index with the (baz, id) index might work provided that Postgres updates the rows in the same order that it have scanned them, is this an approach worth pursuing?Solution
There is no
To avoid deadlocks with absolute certainty, you could run your statements in serializable transaction isolation. But that's more expensive and you need to prepare to repeat commands on serialization failure.
Your best course of action is probably to lock explicitly with
Should be all right --- the FOR UPDATE locking is always the last step
in the SELECT pipeline.
This should do the job:
A multicolumn index on
If
It is possible for a
transaction isolation level and using
return rows out of order. ...
Also, if you should have a unique constraint involving
ORDER BY in an SQL UPDATE command. Postgres updates rows in arbitrary order:- UPDATE with ORDER BY
To avoid deadlocks with absolute certainty, you could run your statements in serializable transaction isolation. But that's more expensive and you need to prepare to repeat commands on serialization failure.
Your best course of action is probably to lock explicitly with
SELECT ... ORDER BY ... FOR UPDATE in a subquery or a standalone SELECT in a transaction - in default "read committed" isolation level. Quoting Tom Lane on pgsql-general:Should be all right --- the FOR UPDATE locking is always the last step
in the SELECT pipeline.
This should do the job:
BEGIN;
SELECT 1
FROM foo
WHERE baz = 1234
ORDER BY bar
FOR UPDATE;
UPDATE foo
SET bar = bar + 1
WHERE baz = 1234;
COMMIT;A multicolumn index on
(baz, bar) might be perfect for performance. But since bar is obviously updated a lot, a single-column index on just (baz) might be even better. Depends on a couple of factors. How many rows per baz? Are HOT updates possible without the multicolumn index? ...If
baz is updated concurrently, there is still an unlikely corner case chance for conflicts (per documentation):It is possible for a
SELECT command running at the READ COMMITTEDtransaction isolation level and using
ORDER BY and a locking clause toreturn rows out of order. ...
Also, if you should have a unique constraint involving
bar, consider a DEFERRABLE constraint to avoid unique violations within the same command. Related answer:- Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?
Code Snippets
BEGIN;
SELECT 1
FROM foo
WHERE baz = 1234
ORDER BY bar
FOR UPDATE;
UPDATE foo
SET bar = bar + 1
WHERE baz = 1234;
COMMIT;Context
StackExchange Database Administrators Q#68388, answer score: 16
Revisions (0)
No revisions yet.