patternsqlModerate
Concurrent transactions result in race condition with unique constraint on insert
Viewed 0 times
resultuniqueinsertconditionwithconstrainttransactionsconcurrentrace
Problem
I have a web service (http api) which allows a user to restfully create a resource. After authentication and validation I pass off the data to a Postgres function and allow it to check authorisation and create the records in the database.
I found a bug today when two http requests had been made within the same second which caused this function to be called with identical data twice. There is a clause inside the function which makes a select on a table to see if a value exists, if it does exist then I take the ID and use that on my next operation, if it doesn't then I insert the data, get back the ID and then use that on the next operation. Below is a simple example.
As you can probably guess, I got a phantom read on the data where both transactions entered the
I've had a look around at how to defend against this and found a few different options but none seem like they fit our needs for a few reasons and I'm struggling to find any alternatives.
I found a bug today when two http requests had been made within the same second which caused this function to be called with identical data twice. There is a clause inside the function which makes a select on a table to see if a value exists, if it does exist then I take the ID and use that on my next operation, if it doesn't then I insert the data, get back the ID and then use that on the next operation. Below is a simple example.
select id into articleId from articles where title = 'my new blog';
if articleId is null then
insert into articles (title, content) values (_title, _content)
returning id into articleId;
end if;
-- Continue, using articleId to represent the article for next operations...As you can probably guess, I got a phantom read on the data where both transactions entered the
if articleId is null then block and tried to insert onto the table. One succeeded and the other blew up because of a unique constraint on a field.I've had a look around at how to defend against this and found a few different options but none seem like they fit our needs for a few reasons and I'm struggling to find any alternatives.
insert ... on conflict do nothing/update...I first looked at theon conflictoption which looked good however the only option is todo nothingwhich then doesn't return the ID of the record that caused the collision, anddo updatewon't work as it will cause triggers to be fired off when in reality the data hasn't changed. In some instances this isn't a problem but in many cases this might invalidate sessions user sessions which isn't something we can do.
set transaction isolation level serializable;this seems like the most attractive answer, however even our test suite can c
Solution
The root of the problem is that, with default
When a transaction uses this isolation level, a
a
began; it never sees either uncommitted data or changes committed
during query execution by concurrent transactions.
But a
proceed for a row due to the outcome of another transaction whose
effects are not visible to the
the case in Read Committed mode.
The brute-force "solution" to this problem is to overwrite conflicting rows with
The remaining option is to start a new command (in the same transaction), which then can see these conflicting rows from the previous query. Both existing answers suggest as much. The manual again:
However,
within its own transaction, even though they are not yet committed.
Also note that two successive
even though they are within a single transaction, if other
transactions commit changes after the first
the second
But you want more:
-- Continue, using articleId to represent the article for next operations...
If concurrent write operations might be able to change or delete the row, to be absolutely sure, you also have to lock the selected row. (The inserted row is locked anyway.)
And since you seem to have very competitive transactions, to make sure you succeed, loop until success. Wrapped into a plpgsql function:
Detailed explanation:
READ COMMITTED isolation level, each concurrent UPSERT (or any query, for that matter) can only see rows that were visible at the start of the query. The manual:When a transaction uses this isolation level, a
SELECT query (withouta
FOR UPDATE/SHARE clause) sees only data committed before the querybegan; it never sees either uncommitted data or changes committed
during query execution by concurrent transactions.
But a
UNIQUE index is absolute and still has to consider concurrently entered rows - even yet invisible rows. So you can get an exception for a unique violation, but you still cannot see the conflicting row within the same query. The manual:INSERT with an ON CONFLICT DO NOTHING clause may have insertion notproceed for a row due to the outcome of another transaction whose
effects are not visible to the
INSERT snapshot. Again, this is onlythe case in Read Committed mode.
The brute-force "solution" to this problem is to overwrite conflicting rows with
ON CONFLICT ... DO UPDATE. The new row version is then visible within the same query. But there are several side effects and I would advice against it. One of them is that UPDATE triggers get fired - the thing you want to avoid expressly. Closely related answer on SO:- How to use RETURNING with ON CONFLICT in PostgreSQL?
The remaining option is to start a new command (in the same transaction), which then can see these conflicting rows from the previous query. Both existing answers suggest as much. The manual again:
However,
SELECT does see the effects of previous updates executedwithin its own transaction, even though they are not yet committed.
Also note that two successive
SELECT commands can see different data,even though they are within a single transaction, if other
transactions commit changes after the first
SELECT starts and beforethe second
SELECT starts.But you want more:
-- Continue, using articleId to represent the article for next operations...
If concurrent write operations might be able to change or delete the row, to be absolutely sure, you also have to lock the selected row. (The inserted row is locked anyway.)
And since you seem to have very competitive transactions, to make sure you succeed, loop until success. Wrapped into a plpgsql function:
CREATE OR REPLACE FUNCTION f_articleid(_title text, _content text, OUT _articleid int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
SELECT articleid
FROM articles
WHERE title = _title
FOR UPDATE -- or maybe a weaker lock
INTO _articleid;
EXIT WHEN FOUND;
INSERT INTO articles AS a (title, content)
VALUES (_title, _content)
ON CONFLICT (title) DO NOTHING -- (new?) _content is discarded
RETURNING a.articleid
INTO _articleid;
EXIT WHEN FOUND;
END LOOP;
END
$func$;Detailed explanation:
- Is SELECT or INSERT in a function prone to race conditions?
Code Snippets
CREATE OR REPLACE FUNCTION f_articleid(_title text, _content text, OUT _articleid int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
SELECT articleid
FROM articles
WHERE title = _title
FOR UPDATE -- or maybe a weaker lock
INTO _articleid;
EXIT WHEN FOUND;
INSERT INTO articles AS a (title, content)
VALUES (_title, _content)
ON CONFLICT (title) DO NOTHING -- (new?) _content is discarded
RETURNING a.articleid
INTO _articleid;
EXIT WHEN FOUND;
END LOOP;
END
$func$;Context
StackExchange Database Administrators Q#212580, answer score: 14
Revisions (0)
No revisions yet.