HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Concurrent transactions result in race condition with unique constraint on insert

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
resultuniqueinsertconditionwithconstrainttransactionsconcurrentrace

Problem

I have a web service (http api) which allows a user to restfully create a resource. After authentication and validation I pass off the data to a Postgres function and allow it to check authorisation and create the records in the database.

I found a bug today when two http requests had been made within the same second which caused this function to be called with identical data twice. There is a clause inside the function which makes a select on a table to see if a value exists, if it does exist then I take the ID and use that on my next operation, if it doesn't then I insert the data, get back the ID and then use that on the next operation. Below is a simple example.

select id into articleId from articles where title = 'my new blog';
if articleId is null then
    insert into articles (title, content) values (_title, _content)
    returning id into articleId;
end if;
-- Continue, using articleId to represent the article for next operations...


As you can probably guess, I got a phantom read on the data where both transactions entered the if articleId is null then block and tried to insert onto the table. One succeeded and the other blew up because of a unique constraint on a field.

I've had a look around at how to defend against this and found a few different options but none seem like they fit our needs for a few reasons and I'm struggling to find any alternatives.

  • insert ... on conflict do nothing/update... I first looked at the on conflict option which looked good however the only option is to do nothing which then doesn't return the ID of the record that caused the collision, and do update won't work as it will cause triggers to be fired off when in reality the data hasn't changed. In some instances this isn't a problem but in many cases this might invalidate sessions user sessions which isn't something we can do.



  • set transaction isolation level serializable; this seems like the most attractive answer, however even our test suite can c

Solution

The root of the problem is that, with default READ COMMITTED isolation level, each concurrent UPSERT (or any query, for that matter) can only see rows that were visible at the start of the query. The manual:

When a transaction uses this isolation level, a SELECT query (without
a FOR UPDATE/SHARE clause) sees only data committed before the query
began; it never sees either uncommitted data or changes committed
during query execution by concurrent transactions.

But a UNIQUE index is absolute and still has to consider concurrently entered rows - even yet invisible rows. So you can get an exception for a unique violation, but you still cannot see the conflicting row within the same query. The manual:

INSERT with an ON CONFLICT DO NOTHING clause may have insertion not
proceed for a row due to the outcome of another transaction whose
effects are not visible to the INSERT snapshot. Again, this is only
the case in Read Committed mode.

The brute-force "solution" to this problem is to overwrite conflicting rows with ON CONFLICT ... DO UPDATE. The new row version is then visible within the same query. But there are several side effects and I would advice against it. One of them is that UPDATE triggers get fired - the thing you want to avoid expressly. Closely related answer on SO:

  • How to use RETURNING with ON CONFLICT in PostgreSQL?



The remaining option is to start a new command (in the same transaction), which then can see these conflicting rows from the previous query. Both existing answers suggest as much. The manual again:

However, SELECT does see the effects of previous updates executed
within its own transaction, even though they are not yet committed.
Also note that two successive SELECT commands can see different data,
even though they are within a single transaction, if other
transactions commit changes after the first SELECT starts and before
the second SELECT starts.

But you want more:

-- Continue, using articleId to represent the article for next operations...

If concurrent write operations might be able to change or delete the row, to be absolutely sure, you also have to lock the selected row. (The inserted row is locked anyway.)

And since you seem to have very competitive transactions, to make sure you succeed, loop until success. Wrapped into a plpgsql function:

CREATE OR REPLACE FUNCTION f_articleid(_title text, _content text, OUT _articleid int)
  LANGUAGE plpgsql AS
$func$
BEGIN
   LOOP
      SELECT articleid
      FROM   articles
      WHERE  title = _title
      FOR    UPDATE          -- or maybe a weaker lock 
      INTO   _articleid;

      EXIT WHEN FOUND;

      INSERT INTO articles AS a (title, content)
      VALUES (_title, _content)
      ON     CONFLICT (title) DO NOTHING  -- (new?) _content is discarded
      RETURNING a.articleid
      INTO   _articleid;

      EXIT WHEN FOUND;
   END LOOP;
END
$func$;


Detailed explanation:

  • Is SELECT or INSERT in a function prone to race conditions?

Code Snippets

CREATE OR REPLACE FUNCTION f_articleid(_title text, _content text, OUT _articleid int)
  LANGUAGE plpgsql AS
$func$
BEGIN
   LOOP
      SELECT articleid
      FROM   articles
      WHERE  title = _title
      FOR    UPDATE          -- or maybe a weaker lock 
      INTO   _articleid;

      EXIT WHEN FOUND;

      INSERT INTO articles AS a (title, content)
      VALUES (_title, _content)
      ON     CONFLICT (title) DO NOTHING  -- (new?) _content is discarded
      RETURNING a.articleid
      INTO   _articleid;

      EXIT WHEN FOUND;
   END LOOP;
END
$func$;

Context

StackExchange Database Administrators Q#212580, answer score: 14

Revisions (0)

No revisions yet.